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kernel  estimators  is  carried  out.  The  expected  values  of  the 
estimators  are  compared  against  one  another  and  against  the  optimal 
linear  combination  that  minimizes  a  certain  measure  of  error. 
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Considerations  for  the  Linear  Estimation  of  a  Regression  Function 

When  the  Data  are  Correlated 

D.  B.  Holiday,  T.  E.  Wehrly  and  J.  D.  Hart 
TEXAS  ASM  UNIVERSITY 

Abstract 

In  fixed-design  kernel  nonpar ametric  regression,  there  has  been 
a  paucity  of  results  for  models  which  allow  for  correlated  errors. 
Consider  the  following  repeated-measurements  model,  applicable  in 
growth  curve  analysis:  Y,(xt)  -  g(xt)  +  «a(xt),  s*l,...,m 
(e.g., subjects),  t*l,...,n  (e.g.,time  points)  with  errors  of  zero 
mean  and  wi thin-subject  covariance  matrix  Z.  More  specifically,  we 
assume  that  cov[«,(xt),*u(xv)]  ■  6fuo(xt,xv)  where  61U  is  the 
Kronecker-delta  and  a(xt,xv)  is  the  (t,v)th  element  of  Z. 

Furthermore,  it  is  assumed  that  a(xt,xv)  may  be  represented  as  the 
product  of  a  scalar  variance  term  and  a  suitably  restricted 

correlation  function  7(xt-xv).  Kernel  estimators  of  the  population 

* 

regression  function  g(x)  are  examined  for  specific  as  well  as  more 
general  correlation  functions.  Limiting  forms  of  an  optimal  linear 
combination  of  the  subject  means  (and  its  measure  of  error)  are 
derived.  Necessary  and  sufficient  conditions  for  consistency  are 
stated  for  a  general  linear  estimator  for  the  Ornstein-Uhlenbeck 
correlation  function,  and  sufficient  conditions  are  given  for  a  more 
general  covariance  structure.  A  numerical  study  investigating  the 
requisite  amount  of  smoothing  and  the  efficiency  of  four  popular 
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1.  INTRODUCTION 


1.1  The  Model 

Nonparametric  regression  has  received  considerable  attention  in  the 
literature  in  the  last  several  years.  The  usual  model  in  the  fixed- 
design  case  is 


Yi  *  g(*i)  ♦  «i»  i*l, ...,n;  (1.1.1) 

E( «1)«0,  Var^)  -  o2  <  •;  (1.1.2) 

«x, uncorrelated;  (1.1.3) 

x1,...fxn  selected  by  the  experimenter.  (1.1.4) 


The  goal  is  the  estimation  of  the  unknown  regression  function  g(x) 
from  a  sample  (xx,Yx),  ...,(x„,Yn).  More  recently,  the  estimation  of 
g(p>(x),  the  pth  derivative  of  g(x),  has  also  become  an  important 
consideration,  especially  in  growth  curve  applications.  The  term 
nonparametric  derives  from  the  lack  of  a  finite-dimensional 
parameterization  of  the  regression  function  and  not  from  any 
distribution-free  assumption  on  the  part  of  the  error  terms.  We  will 
not  be  concerned  with  distributional  assumptions  and  tests  of 
hypotheses,  the  usual  fare  for  parametric  models.  Certain  smoothness 
constraints  will  be  the  only  assumptions  made  for  g. 

Our  aim  is  to  investigate  the  estimation  of  g(p)(x),  p-0,1,2,... 
in  a  model  that  relaxes  the  assumption  of  uncorrelated  errors.  Such  a 
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model  has  been  Investigated  by  relatively  few  researchers.  If  the 
correlation  matrix  of  the  errors  Is  known,  the  problem  of  estimating 
g  would  be  greatly  simplified.  Without  some  method  of  estimating  an 
unknown  correlation  structure  (Independent  of  the  estimate  of  g) ,  It 
would  be  difficult  to  assess  how  much  of  the  data  smoothness  Is  due 
to  g  and  how  much  Is  due  to  correlation.  In  a  repeated-measurements 
model,  however,  one  would  possess  enough  observations  to  estimate  the 
correlation  Independently  of  the  estimate  of  g.  We  will  therefore  In 
this  paper  Investigate  a  model  which  lends  Itself  to  such  analysis: 

Y§(xt)  ■  g(xt)  +  «,(xt);  (1.1.5) 

s  *  1,2, ...,m,  t  ■  1,2, ...,n; 

E[«,(*t)3  -  o,  all  s,t; 
cov[«f(xt),«u(xv)]  «  8Bue(xt,xv). 

The  quantity  £su  Is  the  Kronecker -delta  and  o(xt,xv)  Is  the 
covariance  of  errors  with  the  same  Initial  Index  (s*u).  This  Is  the 
type  of  model  which  would  naturally  arise  In  the  growth-curve 
setting,  an  Important  problem  In  biological  applications  (see,  for 
example,  Grizzle  and  Allen, 1969  and  Morrison, 1970).  Consider  a  sample 
of  m  subjects,  behaving  Independently,  and  measured  for  response  Y  at 
times  x1,...,xn.  The  regression  function  g(x)  would  represent  the 
mean  of  the  entire  population  of  subjects  from  which  we  have  a  random 
sample  of  size  m.  The  first  and  second  derivatives  of  g(x)  are  of 
Interest  since  they  represent  the  velocity  and  acceleration, 


respectively,  of  population  growth.  Since  it  is  likely  that  the  error 
term  for  a  subject  measured  at  time  xt  is  (positively)  correlated 
with  the  error  term  at  time  xt+l,  we  will  have  a  set  of  correlated 
errors  for  that  particular  subject.  The  fact  that  the  subjects  are 
behaving  independently  would  lead  to  an  assumption  of  zero 
correlation  across  subjects.  Most  approaches  assume  that  the  subjects 
possess  the  same  wi thin-subject  correlation  structure.  Me  will  assume 
a  certain  form  for  the  covariance  term  a(*t,xv) ,  which  in  effect  will 
be  equivalent  to  hoooscedastic  and  stationary  errors.  In  particular, 
it  will  be  assumed  that  the  correlation  may  be  written  as  a  function 
of  the  difference  in  xt  and  a v.  Homoscedasticity  simply  means  that 
the  variance  of  errors  is  constant  over  all  points  s  in  the  range  of 
interest.  Stationarity  implies  that  the  correlation  of  the  errors  at 
different  points  only  depends  on  the  absolute  distance  between  the 
points  and  not  on  their  particular  location.  In  the  next  chapter  we 
will  spell  out  the  model  in  greater  detail. 

1.2  The  Estimators 

In  the  traditional  model  (1.1.1)  there  have  been  various  approaches 
to  the  nonparametric  estimation  of  g.  Many  evolved  from  the 
estimation  of  the  conditional  regression  function 

g*(x)  -  E(Y|x«x).  (1.2.1) 

In  this  setting  the  design  points  x^...^  are  not  selected  prior  to 
the  collection  of  the  data,  as  in  the  fixed-design  model.  Instead,  we 


have  a  random  sample  (X1,r1), ...,(X„,Tn)  from  a  bivariate 
distribution  with  density  f(x,y),  say.  Let  fx(x)  denote  the  marginal 
distribution  of  X,  that  is, 

fx(x)  -  1.1  f (x»y)  dy.  (1.2.2) 

Than  the  conditional  density  of  Y  is 

fT)x(y|x)  ■  f(x,y)/fJt(x),  (1.2.3) 

provided  fx(x)  >  0.  The  conditional  mean  (1.2.1)  —  the  regression  of 
Y  on  X  —  is  then 

g*(x)  «  /.*  y  fT|X(y|x)  dy  ■  !_Z  y  [f (x,y)/fx(«)]  dy.  (1.2.4) 
Watson  (1964)  was  the  first  to  realize  the  connection  between  density 
estimation  and  the  functional  (1.2.4).  The  estimator 

g™(x)  -  Ej.x  Yl  K[(x-xi)/h]  /  L*ml  K[(*-xt)/h]  (1.2.5) 

was  also  independently  proposed  ty  Madaraya  (1964)  and  hence  is  known 
as  the  Madaraya -Wat son  (MW)  estimator.  The  function  K  is  known  as  a 
kernel  and  was  originally  restricted  to  be  a  probability  density 
function.  The  quantity  h  is  known  as  the  bandwidth  and  controls  the 
amount  of  smoothing  done  by  the  estimator.  A  large  bandwidth  gives 
greater  influence  (weight)  to  T  values  corresponding  to  X  values  less 
local  to  the  point  of  estimation  x,  thereby  resulting  in  a  smoother 
estimate.  The  value  of  h  >  0  and  the  kernel  function  K  are  user- 
supplied. 

Kernel  methodology  has  its  roots  in  the  nonpar ametric  estimation 
of  an  unknown  continuous  probability  density  function,  f(x),  given 
data  Mo  paper  on  nonparametric  estimation  of  functions 

would  be  complete  without  reference  to  the  classic  papers  of 


Parzen  (1962a)  and  Rosenblatt  (1956),  which  form  the  basis  of  kernel 
estimation. 

In  the  fixed-design  model  (1.1.1),  Priestley  and  Qiao  (1972) 
were  the  first  to  propose  the  following  kernel  estimate,  which  bears 
their  name: 

g£c(x)  «  (1/h)  Yi  (x^-x^  K[(x-xA)/h] .  (1.2.6) 

This  estimate  has  been  widely  studied,  as  will  be  seen  in  the 
references  of  Section  1.5.  Many  variations  of  this  estimator  have 
been  proposed,  most  notably  the  Gasser -Muller  (GM)  estimate  (Gasser 
and  Muller, 1979) .  These  authors,  using  an  argument  based  on  the  mean 
value  theorem,  propose 

g„M(x)  -  I".!  yj.  {  (1/h)  K[(x-u)/h]du  },  (1.2.7) 

where  ■  [s1#s1+1]  with  the  Sj^-values  chosen  to  satisfy 
xi  S  sx  S  x1+1 .  Notice  that  these  estimators  are  all  of  the  linear 
form 

gn(x)  ■  wni(x)  y*  (1.2.8) 

for  appropriate  choices  of  wnl(x) .  Also  observe  that  the  weights  in 
(1.2.5)  sum  to  one,  and  therefore  the  NN  estimate  is  a  true  weighted 
average.  Even  though  originally  proposed  for  conditional  mean 
estimation,  the  NN  estimate  is  commonly  used  in  the  fixed  design 
model.  If  the  design  points  are  equally  spaced,  the  NN  estimate  may 
be  obtained  from  the  PC  estimate  by  dividing  by  the  sum  of  the 
weights  in  (1.2.6).  Such  estimators  are  said  to  be  "cut-and- 
normalized",  and  often  reduce  the  bias  of  the  estimator,  especially 
if  predicting  at  an  x  near  the  boundary  of  the  range  of  interest. 
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In  this  paper  we  will  consider  only  properties  of  an  optimal 
linear  estimate  (1.2.8)  and  of  kernel-based  estimates  along  with 
their  cut-and-normalized  versions  for  equally- spaced  design  points. 
With  the  addition  of  the  cut-and-normalized  91  estimate,  which  we 
will  denote  GC,  there  are  then  four  kernel  estimates:  PC,  GM,  NW,  GC. 
When  estimating  g(x)  in  our  repeated-measurements  model,  the  yt- 
values  will  be  replaced  by  the  subject  means  y(xt),  t*l,...,n. 

Other  methods  of  nonparametric  estimation  of  the  regression 
function  which  have  attained  stature  include  the  method  of  stochastic 
approximation,  nearest-neighbor  estimation,  the  regressogram,  and  the 
chief  competitor  of  kernel  estimation  —  splines.  The  scope  of  this 
paper,  however,  is  restricted  to  kernel  estimation.  The  reader  is 
referred  to  Prakasa  Rao  (1983)  for  a  brief  summary  of  the  above 
methods. 

1.3  The  Problem 

Given  the  repeated-measur aments  model  (1.1.5),  the  foremost  aim  of 
this  research  is  to  assess  the  performance  of  the  four  kernel 
estimators  PC,  GM,  NW,  and  GC  when  the  only  allowance  made  for 
correlation  is  bandwidth  adjustment.  Pursuant  to  this  goal,  the 
estimators  will  be  compared  with  one  another  and  with  some  standard 
which  represents  the  best  possible  linear  estimator  one  could  use.  Of 
course,  we  will  have  to  define  what  is  meant  by  the  term  "best”.  The 
loss  in  performance  of  the  kernel  estimators  for  a  given  variance  and 
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correlation  situation  may  point  to  a  need  to  incorporate  some 
estimate  of  the  correlation  into  the  estimator  (a  la  generalized 
least  squares ),  or,  to  possibly  abandon  kernel  estimation  completely 
in  favor  of  one  of  the  competing  methods  mentioned  earlier.  We  will 
therefore  need  to  obtain  and  investigate  some  optimal  linear 
combination  of  the  subject  means  to  afford  a  standard  for 
comparisons.  Limiting  expressions  of  this  linear  estimator  will  shed 
light  onto  consistency  properties  as  well  as  suggest  a  possible 
modification  of  the  kernel  function.  We  will  deal  with  these  goals  in 
Chapter  2,  which  will  focus  on  general  linear  estimation  —  not 
restricted  to  kernel  form. 

Chapter  3  will  review  important  results  in  kernel  estimation  of 
g(0)(x),  including  the  recent  work  of  Hart  and  Wehrly  (1986).  These 
authors  consider  the  same  repeated-measurements  model  and  develop 
results  primarily  for  the  GM  estimator  under  fairly  general 
correlation  structures.  Chapter  3  will  also  describe  the  results  of  a 
numerical  study  designed  to  measure  the  efficacy  of  kernel  estimates 
relative  to  the  best  linear  combination.  A  variety  of  functions  and 
variance-correlation  settings  will  be  considered.  The  uncorrelated 
case  will  be  included  for  comparison.  The  optimal  amount  of  smoothing 
selected  by  the  estimators  (according  to  minimization  of  a  measure  of 
global  error)  will  be  a  by-product  of  the  study  and  may  be  used  to 
study  bandwidth  variation  as  a  function  of  the  amount  of  variance  and 
correlation  present  in  the  model. 


1.4  Some  Theoretical  Terms  and  Definitions 


For  the  benefit  of  the  reader  who  is  unfamiliar  with  the  nature  of 
the  research  being  done  in  this  area,  we  will  give  a  sampling  of 
technical  definitions  which  are  conmon  in  the  literature,  which  is 
reviewed  in  the  subsequent  section.  Many  of  the  results  pertain  to 
verification  of  certain  asymptotic  (large  sample)  properties  for 
various  estimators.  These  include  probability  bounds  on  the  error  of 
estimation,  asymptotic  normality,  and  various  modes  of  consistency, 
many  of  which  are  given  below. 


Let  gn(x)  denote  an  estimator  of  g(x)  in  a  model  similar  to  (1.1.1). 

He  first  define  two  criteria  for  nearness  of  the  estimator  to  the 
function: 

Definition  (1.4.1)  The  quantity 

MSE[gn(x)]  ■  E[g„(x)-g(x)]2  (1.4.1) 

is  known  as  the  mean  squared  error  of  gn  at  the  point  x. 

Definition  (1.4.2)  The  quantity 

MISE[gn,g]  ■  E  /  [gn(x)-g(x)]2dx  -  /  E[gn(x)-g(x) ]2dx  (1.4.2) 
is  known  as  the  mean  integrated  squared  error  of  gn  with  respect  to 
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Letting  G*  denote  a  space  of  functions,  the  following  modes  of 
convergence  are  defined  for  a  sequence  gn  of  estimators: 

Definition  (1.4.3)  Consistency  in  Quadratic  Mean. 

For  every  x  and  g«G*,  E[gn(x)-g(x)]2  — »  0,  as  n  -*  •.  (1.4.3) 

Definition  (1.4.4)  Integratedly  Consistent  in  Quadratic  Mean. 

For  every  g«G*,  EJ [gn(x)-g(x) ]2dx  — »  0,  as  n  — ♦  «.  (1.4.4) 

Definition  (1.4.5)  Weakly  Consistent. 

For  every  x  and  g«G*f  gn(x)  -£-»  g(x),  as  n  — ►  ®.  (1.4.5) 


(1.4.6) 

Definition  (1.4.7)  Strongly  Consistent. 

For  every  x  and  g«G*(  gn(x)wpl  >  g(x),  as  n  — ♦  «».  (1.4.7) 

Definition  (1.4.8)  Uniformly  Strongly  Consistent. 

For  every  g«G*,  |gn(x)  -  g(x)|  2£E^->  0,  as  n  — *  ».  (1.4.8) 

Definition  (1.4.9)  Asymptotically  Unbiased. 

For  every  x  and  g«G*,  E[gn(x)]  — *  g(x),  as  n  — *  «».  (1.4.9) 


Definition  (1.4.6)  Uniformly  Weakly  Consistent. 

For  every  g«G*,  s“p  |gn(x)  -  g(x)  |  0,  as  n  — ►  ». 


Definition  (1.4.10)  Uniformly  Asymptotically  Unbiased. 

For  every  g«G*,  *“p  |E[gn(x)]  -  g(x)  |  — ►  0,  as  n  -*  »,  (1.4.10) 

As  in  density  estimation,  there  exist  classes  of  functions  for 
which  unbiased  estimators  do  not  exist  and  for  which  uniformly 
consistent  estimators  do  not  exist.  The  reader  is  referred  to  Prakasa 
Rao  (1983)  for  more  details. 

We  will  deal  primarily  with  pointwise  consistency  in  mean 
squared  error  (1.4.3)  and  global  consistency  in  mean  integrated 
squared  error  (1.4.4).  Since  the  n-fold  integral  in  (1.4.4)  presents 
analytic  and  computational  problems,  we  will  later  introduce  a 
discrete  analog  known  as  the  mean  averaged  squared  error. 

1.5  A  Review  of  Selected  Literature 

The  literature  on  nonparametric  regression  is  presently  developing  at 
such  a  rate  that  the  body  of  knowledge  will  soon  rival  that  of 
nonparametric  density  estimation.  Of  course,  this  is  to  be  expected 
since  most  nonparametric  methods  are  adapted  and  motivated  from 
existing  density  estimation  methodology.  For  the  reader  not  familiar 
with  density  estimation,  Tapia  and  Thompson  (1978)  contains  a  useful 
survey.  Prakasa  Rao  (1983)  not  only  provides  a  comprehensive  review 
of  density  estimation,  but  also  reviews  estimation  of  functionals 
related  to  densities,  which  includes  the  regression  function. 

Collomb  (1981,  1985)  and  Stone  (1977)  also  provide  extensive 


bibliographies  on  nonpar ametric  regression.  The  focus  of  the  latter 
paper  is  on  estimation  in  the  random  regressor  model  (1.2.4).  An 
excellent  and  up-to-date  survey  of  fixed-design  methods  (particularly 
for  splines)  may  be  found  in  Eubank  (1986). 

The  Nadar aya-Wat son  estimator  is  usually  considered  in  the 
context  of  the  random  regressor  model.  The  following  authors  have 
considered  this  estimator  and  its  variations:  Rosenblatt  (1969), 
Nadaraya  (1970,  1973,  1983a,  1983b),  Schuster  (1972),  Noda  (1976), 
Devroye  (1978),  Schuster  and  Yakowitz  (1979),  Devroye  and 
Wagner  (1980a,  1980b),  Spiegelman  and  Sacks  (1980),  Greblicki  and 
Krzyzak  (1980),  Mack  and  Silverman  (1982),  Johnston  (1982),  and 
others. 

As  previously  mentioned,  the  Gasser -Muller  estimator  is  a 
variation  of  the  Priestley-Chao  estimator.  Authors  considering  forms 
of  these  estimators  include  Benedetti  (1974,  1975,  1977), 

Clark  (1977),  Gasser  and  Muller  (1979),  Cheng  and  Lin  (1981a,  1981b), 
Georgiev  (1984a,  1984b,  1984c,  1984d),  Georgiev  and  Greblicki  (1986), 
and  others.  The  last  paper  also  contains  useful  results  for  arbitrary 
linear  estimators. 

Gasser,  Muller,  Kohler,  Molinari,  and  Prader  (1984)  consider 
applications  of  kernel  methodology,  including  derivative  estimation, 
to  the  estimation  of  individual  growth-curves.  Azzalini  (1984)  and 
Glasbey  (1979)  attack  the  growth-curve  problem  from  a  parametric 
point  of  view.  A  linear  model  with  autocorrelated  errors  is 
considered  by  the  former  author  while  the  latter  author  studies 


estimation  In  a  nonlinear  regression  model,  also  with  an 
autocorrelated  error  structure  similar  to  ours.  Diggle  and 
Hutchinson  (1985)  consider  ( nonpar ametric)  spline  estimation  under  an 
autocorrelated  errors  model.  An  approach  based  on  regression  analysis 
of  a  continuous  parameter  time  series  is  considered  in  several  papers 


found  in  Parzen  (1967). 


2.  OPTIMAL  LINEAR  ESTIMATION 

2.1  Model  and  Estimator  Notation 

As  mentioned  in  Chapter  l,  the  development  of  an  optimal  linear 
combination  of  the  data,  not  necessarily  of  kernel  form,  is  now 
considered.  First  the  model  and  the  assumptions  are  explicitly 
stated: 


Yi(*t)  »  g<*t)  ♦  «,(»t>; 

s  =  l,2,...,m  t  ■  1,2, ...,n; 

Et«,(xt)3  «  0,  all  s,t; 

cov[cf(xt),«u(xv)]  »  6>ua(xt,xv).  (2.1.1) 

Let  the  vector  of  design  points  and  corresponding  true  function 
values  be  denoted  by 


X  -  [*x,.. and  (2.1.2) 

g  *  [g(*i>. •••»g(*n)3' .  (2.1.3) 

When  it  suits  our  purpose,  we  will  often  use  *l  and  xnl 
interchangeably  to  emphasize  the  dependence  of  a  sequence  of  design 
points  on  n.  In  our  development  we  will  usually  use  an  equally- spaced 
set  of  design  points  on  [0,1],  namely, 

x^  -  (i-l)/n,  (2.1.4) 


which  has  spacing  (1/n)  and  often  simplifies  the  arguments. 


Let  7  and  <  denote  the  (m  by  n)  data  and  error  matrices, 
respectively.  In  matrix  notation,  the  model  (2.1.1)  may  be  expressed 
7  ■  l^g'  +  t,  E[«]  *  0,  cov[vec(«’)]  *  I.  0  £,  (2.1.5) 

where  1,  is  an  m-vector  of  l's,  vec  is  the  stacking  operator,  and  6 
denotes  the  Kronecker  product.  Mote  that  the  m  rows  of  «  are 
uncorrelated  with  each  other.  Each  row  has  (n  by  n)  covariance 
structure  £,  whose  (i, j)th  element  is  o(x1,x}).  In  particular  we  will 
be  interested  in  the  case  when  £  ■  o2r,  with  the  (i,j)th  element  of 
the  correlation  matrix  r  being  <y(x1-xp.  Note  that  7  is  an  arbitrary 
correlation  function  assumed  to  satisfy  certain  properties  as  the 
need  arises.  The  specific  example  we  will  use  is  the  Ornstein- 
Uhlenbeck  structure 

7(u)  ■  exp{-a|u|},  a  >  0,  (2.1.6) 

which  for  p  >  0  is  a  reparameterixation  of 

7(u)  -  p|u|,  0  <  p  <  1.  (2.1.7) 

Therefore  the  uncorrelated  case  (p  ■  0)  corresponds  to  a  — »  »  and  the 
unit  correlation  (p  *  1)  corresponds  to  o  — »  0. 

For  estimation  of  the  regression  function  g(x)  at  the  point  x, 
we  will  entertain  the  estimator 

gn(x)  -  £j.x  wni(x)  y (x^,  (2.1.8) 

where 


y(*i>  -  (1/m)  £jBl  y,(xi) 


(2.1.9) 


and  wni(x)  is  an  arbitrary  weight  function  which,  in  addition  to 
dependence  on  x,  may  also  depend  on  the  entire  design  vector  (2.1.2). 
As  in  (2.1.3),  we  denote 


(2.1.10) 


9n  "  . 9n<*n>]' 

as  the  vector  of  estimated  function  values  at  the  design  points. 

Notice  that  (2.1.8)  is  a  special  case  of  the  more  general  form 

g»(x)  «  Zi  I,  wnl(x)  d,,  y .(x*)  (2.1.11) 

with  d,,,  *=  1/m,  s  *  1,2,..., m.  We  will  subsequently  show  that  there 
is  no  need  to  allow  for  the  extra  coefficients  d,,  in  considering  our 
global  optimality  criterion. 

2.2  Mean  Averaged  Squared  Error 

As  a  global  measure  of  discrepancy  between  the  vectors  gn  (the 
estimated  values  at  the  design  points)  and  g  (the  true  values  at  the 
design  points),  we  adopt  the  mean  averaged  squared  error, 

MASE[gn,g]  -  E{(l/n)  Z*ml  [gn(x1)-g(x1) ]2} ,  (2.2.1) 

which  is  an  approximation  of  the  mean  integrated  squared  error. 

MISE[gn,g]  -  E  /  [gn(x)-g(x)]2dx.  (2.2.2) 

The  MISE  is  dealt  with  in  theoretical  situations  whereas  it  is  much 
more  practical  to  use  the  MASS  in  computer  approximations  and  data- 
oriented  applications.  We  should  note  in  passing  that 

MASE[gn,g]  -  MISE[gn,g] , 

provided  that  we  have  an  equally-spaced  or  asymptotically  equally- 
spaced  design.  To  see  this,  let 

rn(x)  -  (1/n)  Zi  itXi  S  x}, 

where  I { * }  is  an  indicator  function.  Than  Fn(x)  could  be  considered  a 
sample  cumulative  distribution  function  converging  to 
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P(x)  -  /!.  f (t)dt, 

where  f(t)  is  celled  a  design  density.  He  may  choose  the  design 
points  as  selected  quantiles  of  this  distribution  or  as  a  random 
sample  from  this  distribution,  in  which  case  MASE[gn,g]  may  be 
written 

E  /  [gn(x)-g(*)]2dFn(x)  ~  E  /  [gn(x)-g(x)]2f (x)dx,  as  n  — ►  ».  (2.2.3) 

We  see  that  the  last  expression  in  (2.2.3)  defines  a  weighted  version 
of  the  M1SE.  Recalling  the  design  points  (2.1.4),  we  see  that  this 
will  lead  to  a  uniform  design  density  on  [0,1]  and  hence  the  right 
side  of  (2.2.3)  corresponds  to  the  usual  uniformly-weighted  MISE  in 
(2.2.2). 

Let  Wn  denote  the  matrix  whose  (i,j)th  element  is  wnl(Xj).  Then 

we  may  write  gn  as 

gn  «  (1/m)  K  *'  (2.2.4) 

using  notation  of  the  previous  section.  The  matrix  equivalent  of  the 
general  form  (2.1.11)  is 


/ 
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U)  NOf„)  ■  MASS [g*, g]  -  (l/n)tr{(dydB)M;WB4(llB-I)  <S(Hn-I)} .(2.2.7) 
(11)  The  optiaal  choice  of  d,  Is 

d^  -  (l/«)l..  (2.2.8) 

(111)  The  opt lml  choice  of  tfn  If 

*b  ■  <*.  ♦  G)-1G  -  [l+g'I^g]"1  I^gg'.  (2.2.9) 

(lv)  The  nlnlaal  MASS,  M(W*),  Is 

[(9'g)/n][l  -  tr(W*)]  -  t(g'g)/n][l  -  g'tt,  ♦  G)'xg].  (2.2.10) 

Proof:  For  (1)  we  not a  that  n  •  NASE[g*,g]  ■  E||g*-g||2 

-  *1  IK*'**  *  *  «l  I2 

-  E||"n«'d,  ♦  (Wn-I)'g||2 

*  E| |NB«'dB| |2  ♦  E| |(WB-i)gJ |2  (where  cross  products  vanish) 

-  E[tr{d;«lfBli;«'d.}]  ♦  tr{g'(Wn-I)(Wa-I)g} 

-  E[tr {(«;« '<!.)(«;«  d.)'}]  ♦  tr{(*B-*>'«9' 

-  tr{var(n;.'d.)}  ♦  tr{(WB-l)  G(lfn-I)} . 

Racall  that  tha  rows  of  •  ara  uncorralatad  and  hava  coaon  covarlanca 
■atria  £.  Lat  •  '  ■  «a]  danota  tha  ■  uncorralatad  coluans . 

Than 

Rn.'d.  -  tlml  *„«,<!„,  (2.2.11) 

which  Is  a  1 in war  combination  of  tha  uncorralatad  quantities  NB«(. 
Hanca 

vmr[**n*  d»]  -  i;.!  di,vartaB,,]  -  (dJd.)aBain.  (2.2.12) 
Equation  (2.2.7)  Isssadi Italy  follows  upon  substitution  into  tha  last 
aaprassion  for  MASS. 
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For  (ii)  it  is  widely  known  that  the  solution  to  the 
minimization  problem 

■in  d^d. 

s.  t.  l^d,  -  1  (2.2.13) 

is  d^  ■  (l/a)^.  Sine*  this  value  is  an  attainable  lower  bound 
independent  of  Nn,  this  is  also  the  solution  to  the  original 
question,  due  to  the  nonnegativity  of  the  other  quantities  in 

(2.2.7) .  Mote  that  the  first  MASS  term  represents  the  contribution 
due  to  the  variance  and  the  last  tern  represents  the  squared  bias. 

For  (iii)  we  will  substitute  the  optiaal  d,  (2.2.8)  into 

(2.2.7) ,  obtaining 

MASE[g,,g]  -  M(Mn)  -  (l/n)tr{M;iBWB+(lfn-I)  G(Wn-I)} .  (2.2.14) 
Due  to  the  symmetry  of  G  and  the  properties  of  traces  we  may  write 
this  as 

M(Wn)  -  (l/n)tr {*;(!,  ♦  G)Wn  -  2w;G  ♦  G} .  (2.2.15) 

To  find  the  optimal  value  of  Wn,  we  aay  proceed  in  either  of  two 
directions.  Me  may  use  the  properties  of  aatrix  calculus  to 
differentiate  the  trace  directly,  or  we  any  express  the  MASS  as  a 
quadratic  form  of  a  Kr on acker  product,  and  then  take  the  faeiliar 
derivative.  Following  the  former  approach,  we  obtain 

8M(Mn)/8Mn  -  [(E.-K5)  ♦  (L.-K5)  ’  ]N„  -  2G.  (2.2.16) 

Upon  setting  this  value  equal  to  zero,  the  result  Immediately 
follows.  Note  that  the  fact  that  (1,+G)  is  positive  definite  will 
guarantee  that  (2.2.15)  is  indeed  minimized .  The  alternate  expression 
in  (2.2.9)  for  W*  follows  froa  the  aatrix  fact 
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r 

$ 


(I  +  AB)"1  ■  I  -  A(I  +  BA)_1B, 


(2.2.17) 


by  factoring  out  I^1  and  taking  A  ■  g  and  B  ■  g'l^1.  Tha  calculations 
ara  straightforward  but  tadious  and  hanca  omit tad. 

For  (iv),  tha  rasult  is  found  aasily  by  substituting  (2.2.9) 
into  (2.2.15)  and  using  tha  wall-known  proparty  tr(AB)  *  tr(BA). 


In  light  of  (2.2.8) i  wa  will  only  considar  estimators  of  tha 
form  (2.1.8)  and  not  of  tha  more  ganaral  form  (2.1.11).  Wa  also  nota 
that  if  ana  computes  w’,  than  tha  laft  sida  of  (2.2.10)  is  an 
efficient  formula  for  calculating  tha  minimal  MASS  for  a  known 
regression  function  since  it  involves  fewer  operations  than  tha  other 
formulas . 


2.3  General  Conditions  Sufficient  for  Consistency 


Using  (2.2.9),  M(Wn),  tha  measure  of  error  evaluated  at  tha  optimum, 

may  be  expressed  as 

M(W*)  -  [(g'g)/n][l  -  tr(W*)]  -  l(g'g)/n][l+g'Ei1g]"1.  (2.3.1) 


By  properties  of  Rianann  sums, 


[(g'g)/n]  — *  /  g2(x)dx,  as  n  — »  •, 


(2.3.2) 


which  is  finite  if  g  is  continuous  on  [0,1].  From  (2.3.1)  we  see  that 


in  order  for  N(Nn)  — ♦  0  as  n  — *  •  (with  a  bounded),  we  will  need 


g'l^g  ■  (m/o2)gT'1g  — *  •,  as  n  — *  •, 


(2.3.3) 


unless  m  is  allowed  to  tend  to  infinity. 


In  the  following  argument,  let 

enl(A)  2  ....  2  e^U)  (2.3.4) 

denote  the  descending  eigenvalues  of  an  (n  by  n)  positive  definite 
matrix  A. 


Theorem  (2.3.1)  Suppose  g  is  continuous  on  [0,1]  and  not  identically 
0.  If  m  — »  «  or  if  n/enl(T)  — *  •  as  n  — ♦  •,  then  M(W*)  — »  0. 

Proof:  In  light  of  (2.3.3)  the  result  will  follow  by  showing 
g'Z^g  — *  «.  Using  the  optimization  lemmas  for  positive  definite 
quadratic  forms, 

g'l^g  -  (m/o2)gT_1g  2  (m/a2)g'g  enn(r1).  (2.3.5) 

By  the  spectral  decomposition,  the  eigenvalues  of  P'1  are  positive 
and  are  the  reciprocals  of  the  eigenvalues  of  r.  Furthermore, 
n/enl(D  is  greater  than  one  for  all  n  since  enl(D  <  lj_x  enl(D 
•  tr(D  »  n. 

Hence 

*  ■n[(g'g)/n]/enl(D  (2.3.6) 

-  Jg2(x)dx  (mn/enl(D),  (2.3.7) 

from  which  the  result  clearly  follows  since  the  integral  is  strictly 
positive.  ■ 

In  a  subsequent  section,  the  eigenvalue  condition  of  this  theorem 
will  be  related  to  the  properties  of  the  spectral  density  associated 


with  the  Ornstein-Uhlenbeck  correlation  function. 


2.4  Results  for  the  Ornstein-Uhlenbeck  Structure 

In  this  section  we  consider  applicetions  of  the  previous  results  as 
in  the  case  of  the  Ornstein-Uhlenbeck  covariance  structure  (2.1.6)  in 
the  equally- spaced  case  (2.1.4).  Let 

p  ■  exp(-o),  a  >  0  (2.4.1) 

be  a  nonnegative  constant  and  define 

pn  ■  p1/n  ■  exp(-a/n),  (2.4.2) 

where  1/n  is  the  spacing  of  the  design  points  on  [0*1].  In  this  case 
the  correlation  matrix  r  has  only  n  distinct  elwants  and  has  the 
synmetric  Toeplitz  fora  with  (i,j)th  element  pj1-^,  due  to  the  fact 
that  -  |i-j|/n.  Since  the  elements  of  the  first  row  of  a 

synmetric  Toeplitz  matrix  determine  the  distinct  elements*  we  often 
use  the  special  notation 

T  -  STOEPL(l,pn, . . . ,p"-1)  (2.4.3) 

to  refer  to  this  matrix.  In  general*  a  matrix  A  is  said  to  be 
Toeplitz  if  its  (i, j)th  element  is  some  function,  say  a(j-i),  as  j-i 
ranges  from  1-n  to  n-1.  We  write  this  using  notation 

A  -  TOEPL[a(l-n) , . . . *a(0) * . . . ,a(n-l) ] .  (2.4.4) 

Notice  that  STOEPL  requires  that  a(i)  -  a(-i). 

We  are  interested  in  finding  necessary  and  sufficient  conditions 
under  which  M(W*)  — »  0.  The  following  lemma  will  be  useful  in  this 
regard. 
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Lemma  (2.4.1)  Suppose 

«!«*„!*  (i-l)/n,  1*1,2, . . .  ,n, 

g  ■  [g(«ni)»...,g(*nn)]‘»  and 

r  ■  STOEPLU .p2,...^'1). 

Then  the  quadratic  fora  g'r~1g  may  be  written 

g,r1g  -  (l-P*)'1  [Bnl  ♦  Sn  ♦  Bnn] ,  (2.4.5) 

where  Sn,  the  sum  of  the  interior  terms,  is 

Sn  -  £i-2  tC1+Pn2>92(«ni)  ~  Pn9<*nl> !*(*„, i-l>  +  (2.4.6) 

and  Bnl,  Bnn  are  boundary  terms  with 

Bnl  *  9<«nl)  “  Pn«<*n2>]»  ®nd  (2.4.7) 

Bnn  *  9<*nn>  [g<«nn>  -  Pn9<*n, n-l> 1 •  (2.4.8) 

Furthermore,  if  g  has  at  least  two  continuous  derivatives,  is  right 
and  left  differentiable  at  0  and  1,  respectively,  and  g2(x), 
g(x)g'(x)  are  integrable  on  [0,1],  then 

(l-p*)"1  Sn  -»  (l/2a)  /2  g(x) [a2g(x)-g* (x) ]dx,  (2.4.9) 

(1-p*)"1  Bnl  -*  (l/2a)  g(0)[eg(0)-g' (0+)],  and  (2.4.*0) 

(1-p*)'1  Bn-n  -»  (l/2a)  g(l)  [ag(l)+g'  (1-)].  (2.4.11) 


Proof:  The  inverse  of  r  is  a  patterned  matrix  of  tridiagonal  form. 
Direct  multiplication  yields  the  form  (2.4.5)  for  g'r-1g. 

Next  we  factor  out  pn  ■  e_tt/n  in  Sn,  obtaining 
Sn  -  •’*/n  Z^2  9(*nl){[eo/n+e-o/n]g(xnl)-[g(xnfl.1)+g(xnfl+1)]}.  (2.4.12) 
Now,  adding  the  Taylor  series  expansions  for  e#/n  and  e~a/n  yields 

2  +  (a2/n2)  +  0(n"4).  (2.4.13) 
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Expanding  g^^)  and  g(Xnil+1)  about  tha  point  xni  yields 

g(*n,i-i>  ■  g^i*  +  g'<Xni)(-l/n)  +  g* (*£i)(l/n)2/2! ,  and 
9<xn,i+l>  “  9<xni)  +  g'(*ni)(l/n)  +  g' (x^)  (l/n)2/2! , 

where  x^  is  in  [*„,!_!,*„!]  and  is  in  [x^x,,^] .  Since  the 
first  derivative  terms  cancel  out,  we  obtain 

9<*n,i-l>  +  9<xn,i+l>  *  29<»nl>  +  ( l/2n2) [g' (x^1)+g' (x^) ] . (2 . 4 . 14) 
Substitution  into  (2.4.12),  factoring  out  (1/n2),  and  rearranging 
yields 

neo/nSn*(l/n)lJ~2g(xnl)  {(e2+0(n“2)  )g(xni)-(l/2)  [g*  (x^+g*  (x^)  ]*2.4.15) 
The  last  quantity  converges  to 

/o  g(*)U2g(«)  -  g'(x)]dx  (2.4.16) 

by  the  continuity  of  g'(x)  and  the  fact  that  (2.4.15)  has  the  form  of 
an  approximating  Riemann  sum.  Finally,  the  result  for  Sn  follows  from 
the  fact  that 

[n  ea/n]-1(i-p2)-1  -  t(l/n)e"o/n]/[l-exp(-2a/n)]  ->  l/2o 
by  an  application  of  L’ Hospital's  Rule. 

Next  consider  the  boundary  term  Bnl,  which  by  an  argument 
similar  to  that  above  may  be  expressed  as 

Bm  ■  g(xni>tg(xnl)  -  P„g(*n2>3 

-  (1/n)  g’(<!>  +  a  g(x„2)  +  g(xn2)  0(l/n)],  (2.4.17) 

where  x*x  is  in  [acnl # *nz  1  •  Since  xni  — *  0  for  i  fixed  (as  n  — »  »)  and 
g  and  its  first  derivative  are  continuous,  we  have 

g(*nl)  ■»  g(0),  g(*n2)  ■*  g(0),  and  g'(x*x)  ■+  g'(0+),  as  n  — ♦ 

The  result  for  Bnl  follows  by  noting  that 


(1/nXl-p‘r1  -♦  (1/2  a), 

from  another  application  of  L' Hospital's  Rule.  A  similar  argument 
produces  the  result  for  Bnn,  and  is  hence  omitted. 

One  should  note  the  similarity  of  this  result  with  the  inner  product 
representation  of  a  Hilbert  space  investigated  in  Parzen  (1967, p. 403) 
in  the  context  of  regression  analysis  of  a  continuous  parameter  time 
series.  Now  we  are  equipped  to  easily  prove  the  main  result  of  this 
section. 


Theorem  (2.4.1)  Suppose  m  is  fixed,  g  is  not  identically  zero,  and  g 
satisfies  the  conditions  in  Lemma  (2.4.1).  Then  for  the  Ornstein- 
Uhlenbeck  model  (2.1.7),  augmented  to  include  p  *  0,  a  necessary  and 
sufficient  condition  for  M(W*)  — *  0  as  n  — »  *»  is  p  ■  0  (a  *  «•). 

Proof:  Suppose  that  p  ■  0.  Then  r  ■  I  and  has  an  eigenvalue  1  of 
multiplicity  n.  Hence  n/enl(I')  ■  n  — »  •  ai  n  - ♦  •  and  M(W*)  — ►  0  by 
Theorem  (2.3.1).  [Alternatively,  an  examination  of  (2.3.1)  shows  that 
M(W*)  ~  c2/(mn)  —•»  0  as  n  — »  <».  ] 

Next  assume  M(W*)  — ►  0.  For  the  sake  of  a  contradiction,  assume 
p  >  0.  Define  the  quantity  J(a)  to  be 

J(a)  ■  (l/2a)  {J*g(x) [a2g(x)-g* (x) ]dx 
+  g(0)  [ag(0)-g'  (0+)  ]  +  g(l)  [ag(l)-*-g'  (1-)]}  . 


(2.4.18) 
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By  Leona  (2.4.1)  we  have  gT-1g  — *  J(a)  <  *».  Recall  from  (2.3.1) 
that 

M(w“)  ■  [(g'g)/n]  [1  +  (m/o2)g'r-1g]-1  — >  0, 
thereby  requiring  that  g'r_1g  — *  “  (since  t(g'g)/n]  approaches  a 
finite,  positive  constant).  He  therefore  have  a  contradiction  and  the 
theorem  is  proved.  1 


2.5  Some  Aspects  of  Spectral  Theory 


In  this  section  we  will  digress  briefly  to  examine  how  the  spectral 
theory  corresponding  to  the  Ornstein-Uhlenbeck  covariance  structure 
relates  to  the  problem  at  hand,  the  interplay  of  the  spectral  density 
and  the  eigenvalues  of  the  correlation  matrix  may  help  us  better 
understand  the  conditions  of  Theorem  (2.3.1).  We  will  examine  the 
spectral  density,  f(t),  of  a  stationary  process  having  the  covariance 
structure  (2.1.6).  In  the  equally- spaced  case,  this  coincides  with  an 
autoregressive  process  of  order  1,  that  is,  an  AR(1)  process.  Recall 
that  the  (normalized)  spectral  density,  f(t),  is  the  Fourier 
transform  of 

•y(u)  ■  corr[y(x),y(x+u)]  *  exp(-o|u|),  (2.5.1) 


namely, 


f(t)  ■  (1/2*)  I_Z  axp(-iut)  7(u)du.  (2.5.2) 

The  autocorrelation  function  may  in  turn  be  recovered  through  the 


relation, 
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t(u)  *  1.1  exp(iut)  f (t)dt.  (2.5.3) 

Since  y  is  even,  f  is  real-valued  and  we  may  write  (2.5.2)  as 
f(t)  *  (1/2#)  cos(tu)  7(u)du- 

*  (1/ir)  Jq  exp(-au)  cos(tu)du 

*  (l/»)  Re{  J"  exp[-u(a-it)]du} 

*  (l/»)  Re{l/ (a-it)} 

-  a/w(a2  ♦  t2).  (2.5.4) 

Note  that  f(t)  is  symmetric  with 

max{t}  £(t)  •  f(0)  -  l/»a.  (2.5.5) 

Now  for  discrete-time  processes,  (2.5.2)  may  alternatively  be  written 
£(t)  -  (1/2#)  Zv._:  7(v)  exp(-ivt) ,  (2.5.6) 

■  (1/2#)  [1  +  2  Iv_"  t(v)  cos(vt)],  t  in  [-#,#].  (2.5.7) 

In  our  development  we  observe  that  the  unit  spacing  1/n  tends  to  0  as 
n  — *  «■.  Replacing  p  with  pn  and  a  with  a/n,  we  obtain  the  following 
analogs: 

7n(v)  -  e*p(-o|v|/n)  ■  Pn'v*,  v-0,±l,±2, ...  (2.5.8) 

fn(t)  ■  (1/2#)  [1  +  2  2^,."  7n(v)  cos(vt)  ] 

-  (1/2#) (1-p2)  |1  -  pne-lt|"2  (2.5.9) 

S  (1/2#) (1-p2) (l-pn)~2 

-  (l/2»)[(l+pn)/(l-pn)] 

e  (l/2»)  M^p),  say.  (2.5.10) 

Note  that  M n(p)  becomes  unbounded  as  n  — *  ®  unless  p  ■  0,  in  which 
case  it  is  identically  1.  We  will  use  the  above  results  in  subsequent 
sections,  but  first  we  establish  some  technical  details  in  the  form 

of  a  lemma. 


..  •  .V 


Lmm  (2.5.1)  —  Grenander  and  Szego  (1958) 

Let  £  be  an  integrable  function  on  [-*,»]  satisfying 
a  £  f(t)  £  M,  for  all  t  in  [-*,*]. 
Let 

7(u)  -  axp(iut)  f  (t)dt 
and 

T  -  TOBPL[7(l-n),...,7(0), ...,7(n-l)]. 
Then  there  exists  a  constant  c  for  which 

c  m  £  a^cn  £  e^D  £  c  H. 


(2.5.11) 


(2.5.12) 


(2.5.13) 


For  g  satisfying  the  conditions  of  Theorem  (2.4.1)  in  the  Ornstein- 
Uhlenbeck  model,  we  know  that  n/enl(r)  — -H  ",  else  Theorem  (2.3.1) 
would  be  incorrect.  Computer  studies  suggest  that  n/enl(r)  approaches 
a  finite  limit,  which  undoubtedly  is  a  relatively  simple  function  of 
p,  which  the  authors  were  unable  to  discover.  However,  we  know  that 

n/enl(n  *  n/[c  M^p)]  -»  -(l/2c)log(p)  -  a/2c,  (2.5.14) 

which  provides  a  lower  bound  for  this  limit. 

The  reader  may  question  whether  there  exists  any  covariance 
function  7,  independent  of  n,  which  would  satisfy  the  eigenvalue 
condition  of  Theorem  (2.3.1),  other  than  the  uncorrelated  case.  We 
conjecture  there  are  none,  but  this  assertion  would  need 
investigation.  Of  course,  if  7  i*  allowed  to  change  with  n,  then  the 
condition  can  clearly  be  met.  For  instance,  if  7n(u)  (for  u  #  0) 
tends  to  zero  sufficiently  fast,  then  one  would  expect  the  condition 
to  be  met.  On  the  other  hand,  if  one  recalls  the  natural  application 
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of  growth  curves,  it  smos  that  for  fixed  m,  a  larger  value  of  n 


would  provide  more  accurate  information  about  the  m-sanple  of 


animals,  but  not  about  the  entire  population,  of  which  g(x) 
represents  the  true  average  response  at  time  a.  Hence  for  reasonable 


models  and  fixed  n,  the  absence  of  a  consistent  estimator  seems 
evident . 


2.6  Pointwise  Expressions  at  the  True  Optimum 


Recall  that  our  estimator  has  the  form 


gn(x)  -  Zt  wBi(x) 


(2.6.1) 


He  may  obtain  the  asymptotic  behavior  of  this  estimator  at  a  point 
arbitra.  ly  close  to  x  by  choosing  j  »  jn  such  that 

Xnj  -  (j-l)/n  —»  x,  as  n  — »  •, 


which,  in  formula  (2.6.1),  becomes 


*n<*nj>  *  Zi  wni<*n5)  ?<*»!>■ 


(2.6.2) 


Of  course,  we  could  not  hope  to  do  better  in  a  global  sense  than  if 
we  knew  the  true  correlation  and  the  true  function  and  used 


®n<*nj>  -  Z1  *nl(*nj>  y<*ni> 


(2.6.3) 


where  wnl(xn})  is  the  (i,j)th  element  of  the  MASE-minimizing  matrix 


K  m  (£.  ♦  gg'^gg'  ■  r_1gg'. 


and  the  scalar  quantity  cHn  is  defined  by 


Can  -  C(a2/m)  ♦  g'r"lg]-1. 


(2.6.4) 


(2.6.5) 


He  have  shown  that  the  optimum  global  measure  of  error,  M(N*),  does 
not  approach  zero  and  hence  (2.6.3)  fails  to  be  globelly  consistent 


under  mild  restrictions  on  9  in  ths  Ornstein-Uhlanbeck  modal.  Of 
course  we  note  that  (2.6.3)  is  not  an  estimator  since  the  weights 
depend  on  the  very  function  we  are  attmspting  to  estimate.  We  will 
abuse  the  terminology  and  refer  to  this  quantity  as  an  estimator  whan 
it  is  actually  the  optimal  linear  combination  of  the  subject  means. 
Note  that  the  jth  column  of  W*  is  the  vector  of  weights  used  in 
(2.6.3)  whan  "predicting"  at  x^.  Prom  (2.6.4)  we  see  that  this  is 
the  vector 

, . . . ,  wnn  (^j)]  (2.6.6) 

■  «<*nj>  cen  r_1«-  (2.6.7) 

Now  r~1g  is  recoverable  from  equations  (2.4.6),  (2.4.7),  and  (2.4.8) 
as  given  in  Lmnma  (2.4.1),  yielding  a  formula  for  the  (i,j)th  element 
of  W*s 

wnt<*nj>  *  «<*nj>  <=«,  >_1  (2.6.8) 

where  the  quantity  dnl  is 

dnl  *  ^1+Pn>9<*nl>  '  *«(*<*«, H  (2.6.9) 

for  i*2,3, . . . ,n-l,  with  the  and  values  being 

dnl  *  *9<*nl>  '  *n«<*n2>>«  *r>n  "  ^<«nn>  “  Pn9<*n,n-1>>  •  <2.6.10) 

We  next  evaluate  certain  pointwise  expressions,  incorporating 
the  true  optimum  global  values.  For  reference,  we  will  state  the 


result  in  the  following  1 
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Leaaa  (2.6.1)  Suppose  9,  o2,  and  r  ara  known,  and,  we  use  (2.6.3)  to 
estimate  g(anj).  Then  for  c,,,  defined  by  (2.6.5),  we  have 

(a)  E[9n<Snj)]  -  gU^)  g'r^g 

(b)  Bia*[g*(»nj),g(»nj)]  -  -  g^)  (o2/a)  c.„ 

(c)  Var[g*(*nj) ]  -  g2(«nJ)  (o2/»)  g  r-1g 

(d)  KS®[9n(*nj)  1  -  <o2/»>  -  g2(«nj)  [l+gE^g]-1 


(2.6.11) 

(2.6.12) 

(2.6.13) 

(2.6.14) 


Proof ;  For  (a),  we  use  (2.6.7)  to  obtain 

BlgiiUnj)]  -  li  "ni^)  g(«ni) 

■  c«  i"1*)'  9  -  9^)  c^,  g'r'1g. 

For  (c),  we  have 

var[g*(*nj>]  -  c*,  rl«]'  Var(y)  [g^,)  rlg], 

f roa  which  (c)  follows  since  Var(y)  -  (o2/a)r.  Properties  (b)  and  (d) 
follow  easily  froai  (a)  and  (c)  since 

lias[g*(anj),g(aT1j)]  -  Elg^a*,)]  -  g(«nJ),  and 


Meat  we  obtain  the  limiting  fora  of  these  quantities  under  certain 
conditions  on  the  regression  function  g. 


Theorea  (2.6.1)  Suppose  g  satisfies  the  conditions  of  Leaaa 
and  g  is  not  identically  0.  Let  J(e)  denote  the  limit  as  n 
g'r~1g,  as  given  in  the  leas*  (with  p  w  0),  that  is, 

J(a)  •  (l/2a)  {/£<?(*)  [a2g(«)-g' (a)  ]d* 

♦  g(0)  [ag(0)-g'  (0*)'J  ♦  g(l) [ag(l)+g' (1-) ]} , 


(2.4.1) 
-»  •  of 


(2.6.15) 


mmmmm 
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which  is  finita  sine*  •  <  •.  Lot  JB(a)  danoto  the  limit  of  g  £^9,  os 


n  — ♦  which  is 

J.(a)  ■  (n/a2)J(a),  (2.6.16) 

and  supposo 

•nj  ■  (J-l)/n  — *  a,  as  n  — *  •.  (2.6.17) 

Than  wa  have,  as  n  — *  •, 

(a)  E[9n(*n j)  1  -*  9(*>  J.(«)/(l  ♦  J.U)),  (2.6.16) 

(b)  Biastg*(anj),g(anl)]  -»  -  g(a)/(l  ♦  JB(a)),  (2.6.19) 

(c)  Vartg'Ca^)]  —  g2(a)  J.(«)/(l  ♦  J.(a))2,  (2.6.20) 

(d)  MSBtg*^)]  -*  g2(a)/(l  ♦  J.(a)).  (2.6.21) 


Proof :  Tha  algabra  follows  aasily  fraai  Lassaa  (2.6.1).  by  noting 
(o2/a)  -  (o2/n)[(o2/a)  ♦  gT'1g]'1  ■  [l+g’^g]"1 

-*  <o2/n)t(tf2/n)  ♦  J(a)]*1  •  <1  ♦  J.(a))  *. 


In  viow  of  this  rosult  and  tha  fact  that  for  fimad  a  >  0, 

JB(a)  — »  •  if  and  only  if  a  — »  •,  (2.6.22) 

and  for  fiaad  a, 

J.(o)  — *  •  if  and  only  if  a  — *  •  (p  ■  0), 
it  is  elaar  that  tha  pointwisa  astiaator  avaluatad  at  tha  optiaua 
global  waights  will  fail  to  bo  aoan-squaro  consist  ant  (undar 
raasonablo  conditions  on  g)  unlass  a  — *  •,  or  wa  havo  an 
uncorralatod-arrors  nodal.  Xndaad,  for  a  fiaad  valua  of  a,  this 
astiaator  is  not  ovon  asyaptotically  unbiasad  whan  corrolation  axists 
in  tha  nodal.  Bota  also  that  for  g(s)  *  0  and  n  — *  •, 

Bias2[g*(an3),g(anj)]/Var[g*(anj)]  -»  l/JB(a), 


(2.6.23) 
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which  in  turn  approaches  saro  if  we  lat  a  — *  •  or  a  — *  •.  Bonce  the 
contribution  of  the  bias  to  the  aaan  squared  error  at  a  point  bacons » 
increasingly  saell  as  the  saaple  sis a  f roa  the  population  of  subjects 
increases.  This  is  consistent  with  our  observation  in  the  proof  of 
Thaoran  (2.4.1)  that,  for  p  ■  0,  M(M*)  -  o2/an  -  (l/n)Var[y(»^)] .  He 
note  in  passing  that  MSBfg'd^)]  oust  be  the  minima  MSI  at  the 
point  seeing  all  linear  estimates,  else  Theoree  (2.2.1)  is 
contradicted . 


2.7  Approaches  Using  Continuous  Baal isat ions 

In  teres  of  averages,  the  nodal  (2.1.1)  My  be  writtM  in  a  condensed 
fore, 

y(»t>  •  gi^)  ♦  ?(**),  t-l,2,...,n, 

*[i(*t>J  -  0, 

covIe(at),*(av)]  a  B(at,«v)  e  (eVn)  Rm^a^),  (2.7.1) 

where  we  will  take  the  (t,v)th  elnaant  of  the  correlation  Mtris  r 
to  be 

r(at,iy)  •  7(at-av),  (2.7.2) 

which  in  the  Ornstein-Uhlanbeck  nodel  is 

r(at,sv)  -  eap{  -  (2.7.3) 

Define  continuous  data  realizations  (or  records)  over  the  interval 
[0,1]  to  be  the  randon  functions 

(2.7.4) 


(y(x):  all  z  in  [0,1]}, 
{•(x) :  all  z  in  [0,1]} . 


(2.7.5) 


Our  estiaator  aty  be  writtan 

«„<*>  •  ri  *ni<*>  y(»ni> 

»  (1/n)  Li  [nwni(*)]  y^j).  (2.7.6) 

Suppose  that  tha  weights  nwnl(a)  era  of  the  fora 

"  **<*»**!>•  (2.7.7) 

Observe  that  kernel  estiaator*  say  be  written  explicitly  in  this 
fora.  He  aay  write 

9»(«>  -  (Vn)  lL  ^(a.am)  y (a.*) 

-  /*  l^U.s)  y(»)  dPn(s).  (2.7.6) 

If  g  is  absolutely  integrable,  S|«(*t)|  <  •  for  all  t,  and  *„(*,•) 
converges  unit oral y  to  *(*,•),  it  aay  be  shown  that  (2.7.8)  aay  be 
writtan  as 

J*  R(«,s)  y(s)  dPn(s)  ♦  op(l).  (2.7.9) 

Purtheraore,  if  R(m,*)y(*)  is  continuous  (wpl),  the  integral  in 
(2.7.9)  approaches 

I*  K(*,s)  y(s)  f(s)  ds,  as  n  — *  ».  (2.7.10) 

Recall  that  f(s)  is  the  design  density,  which  in  our  situation  is 
identically  one  for  the  uni f oral y  spaced  design.  Hence  for  a 
continuous  data  record,  ideally  with  n  ■  •  data  points  available,  one 
night  be  led  to  consider  an  estiaator  of  the  fora 

g.(a)  ■  /*  K(x, s)  y(s)ds.  (2.7.11) 

Ne  find  the  Been,  variance  and  naan  squared  error  of  this  estiaator 
in  the  following  leaaa. 


Sis 


Si 
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Suppose  we  consider  the  estiaator  (2.7.11)  in  the  aodel 

(2.7.1)  where 

£[«(»)]  ■  0  and  cov[*(x),i(s)]  «  R(*,s).  (2.7.12) 


Than 


(i)  E[g.(x)]  ■  /  R(x,s)g(s)ds,  (2.7.13) 

(ii)  Var[g_(x)]  •  //  R(x,s)R(x,t)R(s,t)  ds  dt  (2.7.14) 

(iii)  MSE[g.(x) 3  -  //  R(x, s)R(x,t)R(s,t)  ds  dt 

♦  [(  R(x,s)g(s)ds  -  g(x)]2.  (2.7.15) 

Proof :  Clearly  (i)  follows  froa  Pubini's  Theorea.  For  (ii)  consider 
E[gi(x)]  ■  B[ /  R(x,s)y(s)ds  /  R(x,t)y(t)dt] 

-  //  R(x,s)R(x,t)E[y(s)y(t)]  ds  dt 
-  IJ  R(x,s)R(x,t) [R(s,t)  ♦  g(s)g(t)]  ds  dt. 

Subtracting  E2[g.(s)]  frost  the  last  expression  yields  (ii).  The 
expression  (iii)  follows  directly  since  it  equals  the  sua  of  the 
variance  and  the  squared  bias.  a 

One  would  like  to  find  the  function  K(x, • )  which  ainiaizes 

WE[g„(x)]  ■  E[g„(x)  -  g(*)]2.  (2.7.16) 

Using  an  approach  based  on  the  calculus  of  variations,  one  aay  prove 
the  following  theorea. 


v. 


■  -  *  vv\^. 


Theorem  (2.7.1)  Consider  the  model  (2.7.1)  and  the  continuous  data 
record  estimator  (2.7.11).  Let  #  be  defined  implicitly  via  the 


integral  equation 

g(x)  ■  R(x,u)  #(u)  du,  (2.7.17) 

and  define  the  quantities 

L  •  il  g(u)  #(u)  du,  (2.7.18) 

K*(x, s)  ■  the  minimizer  of  MSE[g.(x)],  (2.7.19) 

g*(x)  ■  K*(x,s)  y(s)  ds.  (2.7.20) 

Then 

(i)  K*(x, s)  -  g(x)#(s)/(l  +  L),  (2.7.21) 

(ii)  g*(x)  -  g(x)  [  /  #(s)y(s)ds  ]/(l  +  L),  (2.7.22) 

(iii)  E[gl(x)]  «  g(x)  L/(l  ♦  L),  (2.7.23) 

(iv)  Bias[g*(x) ,g(x) ]  -  -  g(x)/(l  ♦  L)  (2.7.24) 

(v)  Var[g‘(x)]  -  g2(x)  L/(l  ♦  L)2  (2.7.25) 

(vi)  MSE[g*(x) ]  -  g2(x)/(l  ♦  L).  (2.7.26) 


Proof :  The  proof  of  this  result  was  obtained  from  T.  E.  Wehrly,  but 
is  here  omitted. 

The  reader  should  note  the  analogy  between  the  general  results  of 
this  theorem  and  those  of  Theorem  (2.6.1),  which  deals  with  the 
special  case  of  the  Ornstein-Uhlenbeck  model.  Observe  that  the 
previous  theorem  dealt  with  the  limit  of  the  global  optimum  for  a 
finite  number  of  points  n,  as  n  — *  »,  whereas  the  present  theorem 
first  allows  n  — *  -  and  then  obtains  the  optimum  at  a  given  point  x. 


An  interesting  question  is  whether  Ja(a)  of  Theorem  (2.6.1)  is  the 
seme  as  L  of  the  present  theorem  under  the  Ornstein-Uhlenbeck  model. 

If  so,  we  observe  that  the  evaluation  of  finite-n  estimators  at 
global  optimizers  and  letting  n  — ♦  •  leads  to  the  same  results  as 
that  of  optimizing  a  continuous  record  estimator  at  a  point.  This 
should  be  true  since  the  form  of  the  estimator  is  global  in  the  sense 
that  it  weights  the  observations  across  the  entire  interval. 

It  may  be  possible  to  directly  extract  f  in  equation  (2.7.15)  by 
the  theory  of  integral  equations,  as  outlined  in  Whittaker  and 
Watson  (1963, ch.  11).  Since  the  nucleus  of  the  equation,  R(x,u)  is 
symmetric,  that  is, 

R(x,u)  -  R(u,x)  *  (o2/m)  exp{-a|x-u | } ,  (2.7.27) 

a  solution  for  #  is  given  by 

*(x)  -  a„  X*  *n(x),  (2.7.28) 

provided  this  series  converges  uniformly.  The  quantities  #n(x) 
are  the  so-called  characteristic  numbers  and  functions  of  the  nucleus 
R(x,u).  These  quantities  are  also  known  as  eigenvalues  and 
eigenfunctions,  respectively.  The  quantity  Z  an#n(x)  represents  the 
expansion  of  g(x).  The  reader  is  referred  to  Whittaker  and 
Watson  (1963, p.  231)  for  more  details. 

Mere  calculation  of  the  characteristic  numbers  may  be  quite  a 
prodigious  task.  The  characteristic  numbers  are  the  roots  of  the 
equation 

D(X)  -  1  +  Xn/n! ,  (2.7.29) 

where 


a„  «  (-l)n  /*/**••/.  det(R)  dulr...,dun,  (2.7.30) 

and  tha  (i, j)th  alemant  of  tha  (n  by  n)  matrix  R  (in  tha  Ornstein- 
Uhlenback  modal)  is 

R(u1,uj)  *  (o2/m)  expt-alUi-Uj |} .  (2.7.31) 

Fortunately,  there  exist  other  means  of  finding  a  solution  for  *, 
provided  g  satisfies  certain  regularity  conditions.  One  solution, 
based  on  Fourier  analysis,  may  require  fairly  stringent  conditions  on 
g  which  essentially  "assume  away"  the  difficulty  posed  by  the 
boundary  terms  which  are  present  in  Theorem  (2.4.1).  We  will  not  be 
concerned  about  this  difficulty  since  Theorem  (2.4.1)  already  takes 
these  into  account.  The  main  emphasis  will  be  to  show  that  identical 
results  are  obtained,  although  in  a  more  restricted  case. 

Theorem  (2.7.2)  Suppose 

g(x)  ■  /.*  R(x, s)  *(s)  ds,  (2.7.32) 

where  R  is  of  the  form  R(x-s)  and 

R(u)  »  (<72/m)axp(-a|u|).  (2.7.33) 

Assume  also  that  g  vanishes  outside  [0,1]  and  is  everywhere  twice 
continuously  differentiable.  We  then  have 

#(s)  -  (m/2ao2) [a2g(s)  -  g"(s)],  (2.7.34) 


and  hence  tha  optimum  kernel  is 

K*(x,s)  *  g(x) [a2g(s)-g"(s) ]/[(2aa2/m)+/gg(u) [o2g(u)-g"(u) ]du] .(2.7.35) 


Proof:  We  have  the  convolution 


g(x)  «  R**(x)  *  /.I  R(x-s)  #(s)  ds. 


(2.7.36) 


Let  f+  be  the  Pourier  transform  of  the  function  f.  Then  by  a  well 


known  property  of  convolutions, 

g+=  (R**)+  =  R+  #+. 

It  is  also  well  known  that  for  R  of  the  form  (2.7.33), 

R+(u)  =  (2ao2/m)(a2  +  u2)'1. 


We  now  have 


(2.7.37) 


(2.7.38) 


*(s)  *  (l/2x)  /_*  e'i,u[g+(u)/R+(u)]du 

*  (m/2ao2)  t(l/2ir)a2  /  e“1,ug+(u)du  +  (1/2*)  J  u2e"1,ug+(u)du] 


(m/2ao2) [o2g(s)  -  g"(s)], 


(2.7.39) 


where  the  last  equality  results  from  differentiating  the  inverse 
Fourier  transform  of  g+  twice  under  the  integral  sign.  Since  g 
vanishes  outside  [0,1],  so  does  4,  and  it  follows  that 

g(x)  -  R(x-s)  #(s)  ds. 

The  proof  is  completed  by  applying  Theorem  (2.7.1). 


Define  the  quantity,  J*(a)  to  be  the  expression  (2.6.16)  for  J(a) 


without  the  boundary  terms: 


J*(a)  *  (m/2ao2)  J*  g(s) [c2g(s)-g"(s)]ds. 


Observe  that  the  boundary  terms 


(m/2ao2)  g(0)[ag(0)-g(0+)], 
(m/2eo2)  g(l) [ag(l)+g’ (!-)], 


(2.7.40) 


(2.7.41) 


(2.7.42) 


vanish  if  g(0)  *  g(l)  *  0,  which  is  a  condition  of  Theorem  (2.7.2). 
Also  note  that  K*(x,s)  in  Theorem  (2.7.2)  may  be  expressed  as 


■v.t 


From  this  it  is  easy  to  see  that  the  results  of  Theorems  (2.7.1)  and 
(2.7.2)  are  the  same,  with  JB(a)  replaced  by  J*(a),  provided  g 
satisfies  certain  boundary  conditions. 


2.8  Sufficient  Conditions  for  Mean-Square  Consistency 

The  purpose  of  this  section  is  to  provp  a  theorem  which  gives 
sufficient  conditions  for  which 

E[gn(x)  -  g(x>32  — ♦  0,  as  n  — ♦  •  (2.8.1) 

at  a  point  x.  Recall  that  estimators  which  satisfy  this  property  are 
said  to  be  mean-square  consistent  estimators.  It  is  sufficient  to 
demonstrate  that  the  variance  and  the  squared  bias  tend  to  zero  as 
n  — ♦  «.  Earlier  we  discovered  that  if  the  Ornstein-Uhlenbeck 
correlation  model  holds,  the  estimator  g*  is  not  consistent  for  g  as 
n  — »  ».  Obviously  the  Ornstein-Uhlenbeck  function  will  violate  the 
conditions  of  this  theorem.  Indeed,  for  a  correlation  function  not 
dependent  on  n,  it  is  not  clear  at  this  point  whether  or  not  the 
theorem  will  admit  any  consistent  estimators  at  all  when  m  is  fixed. 
However,  it  will  later  be  shown  that  the  theorem  will  admit 
estimators  in  the  uncorrelated  case.  If  one  were  willing  to  assume 
that  the  correlation  function  changes  with  n,  then  consistency  is 
clearly  possible.  The  approach  outlined  below  is  given  by  Georgiev 


and  Greblicki  (1986)  in  the  case  of  uncorrelated-errors  and  m*l 
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Theorem  (2.8.1)  Supposa  the  modal  (2.1.1)  holds  and  fix  m.  Consider 
estimators  of  the  form  (2.1.8),  where  the  weights  may  depend  on  the 
entire  design  vector.  Let  C[0,l]  denote  the  space  of  uniformly 
continuous  functions  defined  on  the  interval  [0,1].  Suppose  for  a 


given  x  in  [0,1], 


(i)  Z t  |wnl(x)|  S  B  <  -, 
(ii)  ZA  wnl(x)  — ♦  1,  as  n  — *  », 


(2.8.2) 

(2.8.3) 


(iii)  for  all  «>0,  Z1  |wnl(x)|  I { |ss-*Tli |>«}  — »  0,  as  n  — ►  «K2.8.4) 
where  if*}  is  an  indicator  function,  and 


(iv)  w„(x)  T  wn(x)  ■  ZjZj  wnl(x)wnj(x)7(xi-xj)  — *  0, 
as  n  — »  «,  where  wn(x)  «  [wnl(x) , . . .  ,wnn(x)  ] ' . 
Then  for  g  «  C[0,1],  E[g  (x)  -  g(x)]2  — >  0,  as  n  — *  •. 


(2.8.5) 


Proof:  Let  «>0  and  x  be  fixed. 

Por  the  bias  consider  |Egn(x)  -  g(x) | 

■  lZlwnl<*>9<*ni>  -  9(a)  I 

■  |Z1wnl(x)[g(xnl)  -  g(x)  +  g(x)]  -  g(x)  | 

-  |Z1wnl(x)[g(xnl)-g(x)]  +  g(x)[Z1wnl(x)  -  1]| 

*  2ll*nl^>ll9(*ni)-9(x)|  ♦  |g(x)  |  |ZxWni(x)  -  l| 

-  ZjwmU)!  |g(xni)-g(x)|l{|x-xnl|S«} 

+  Z1|wnl(x)| |g(xnl)-g(x)|l{|x-xnl|>«] 

♦  |g(x)||Z1wni(x)  -  1| 

*  Zi|w„i(x)|[S"i]  t  l9(anl)-g(x)|l{|x-xnl|S,}  ] 

+  S1[|g(xnl)|  +  |g(x)|]  jwnl(x)|  Iflx-Xni^.} 

♦  |g(z)|  |£iwnl(x)  -  1| 


mmmz 


*  \J3ls.  |g<*)-g(*)|  ^il*»ni(*)| 

+  2  *“p  |g(z)| 

+  *“p  |g(z)|  \Z^nl(x)  -  1| 
s  B  |x!^s,  |g(x)-g(z)| 

+  2  *“P  |g(z)|  Zi|wni(x)|l{|x-xnl|>.} 
♦  *“p  |g(z) |  |iiWnl(x)  -  i|. 


(2.8.6) 


Now  the  second  and  third  terms  may  be  made  as  small  as  desired  by 
choosing  n  sufficiently  large  while  the  first  term  may  be  made 
arbitrarily  small  by  letting  «  — »  0  (by  the  uniform  continuity  of  g). 
Hence  the  absolute  bias, 

|Egn(x)  -  g(x)|  — ►  0,  as  n  — ►  *. 

Therefore  the  squared  bias  must  also  approach  zero.  The  result 
follows  by  noting 

Var [gn(x) ]  «  (a2/ m)  w^(x)  r  wn(x)  — ♦  0,  as  n  — ►  »,  (2.8.7) 

using  assumption  (iv).  ■ 


Observe  that  assumption  (iii)  requires  that  the  sum  of  the 
contribution  of  weights  at  points  xnl  (when  predicting  at  x)  die  off 
for  xnl-values  an  arbitrary  distance  from  x  (as  n  gets  large).  This 
says  that  the  estimation  at  point  x  may  be  concentrated  on  the 
weights  attached  to  y-values  corresponding  to  x^-values  in  smaller 
and  smaller  neighborhoods  of  x  as  n  gets  larger.  We  will  later  see 
that  this  is  what  happens  in  the  special  case  of  the  kernel  form 
wnl(x;h)  as  the  bandwidth  parameter  h  tends  to  0  for  increasing  n. 
The  following  corollary  simplifies  some  of  the  conditions  of  this 
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theorem  at  tha  expanse  of  being  more  restrictive. 


Corollary  (2.8.1)  Consider  the  conditions  of  Theorem  (2.8.1)  and  let 
enl(D  denote  the  largest  eigenvalue  of  r.  Then 

(a)  Condition  (i)  is  implied  by  Condition  (ii)  for  nonnegative 
weights,  wnl(x),  for  all  n,  i,  and  x. 

(b)  Conditions  which  imply  Condition  (iv)  are 

«ni<r>  zi  wni<x>  -*  o,  as  n  -»  -,  (2.8.8) 

or  the  even  stronger  condition, 

n  w^(x)  — ♦  0,  as  n  — *  •.  (2.8.9) 

(c)  If  enl(T)  is  bounded  above  for  all  n  (as  in  the  uncorrelated 
case),  Condition  (iv)  may  be  replaced  by  the  weaker  condition 

£i  wni<*>  — *  0,  as  n  -+  •.  (2.8.10) 

(d)  For  m  — ♦  Condition  (iv)  is  implied  by 

(n/m)  wj^(x)  —♦0,  as  n  — ♦ 


(2.8.11) 


Proof:  Por  (a),  note  that 

£i 

which  must  be  bounded  in  order  for  the  limit  to  exist.  For  (b), 
recall 

(o2/m)  w^(x)  T  wn(x)  S  (o2/m)  e^d*)  w^(x)wn(x)  (2.8.12) 
£  (o2/m)  n  w„(x)wn(x),  (2.8.13) 

since  the  sum  of  the  (nonnegative)  eigenvalues  of  r  equals  the  trace 
of  r,  which  has  value  1  along  the  diagonal.  We  are  done  since  (c)  and 
(d)  follow  directly  from  (b) .  ■ 


Corollary  (2.8.2)  Consider  Theorem  (2.8.1).  Por  the  correlation  model 
(2.1.7),  Condition  (iv)  is  Implied  by 

M^p)  w2x(x)  — ►  0,  as  n  — ♦  ».  (2.8.14) 

Hence  for  p  «  0,  this  is  equivalent  to 

Ei  — *  0,  as  n  — »  ». 

Proof:  The  result  follows  from  the  Corollary  (2.8.1)  by  recalling 

M^p)  *  (1  ♦  pl/n)/(l  -  p1/n>  (2.8.15) 

is  an  upper  bound  for  enl(T),  apart  from  a  multiplicative  constant 
free  of  n,  and  equals  one  for  p  ■  0.  ■ 

We  will  now  proceed  to  Chapter  3,  where  weights  of  the  kernel  type 
are  considered.  The  results  of  this  chapter  will  be  summarized  in 
Section  3.1  to  motivate  the  approach  taken  for  the  numerical  study. 
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3.  KERNEL  ESTIMATORS  0?  REGRESSION  FUNCTIONS 


3 . 1  Introduction 

A  summary  Qf  the  gain  points  of  Chapter  2  will  lead  us  to  the  goal  of 
Chapter  3  and  serve  as  a  useful  reference.  Throughout  this  chapter 
the  term  correlated  errors  model  will  refer  to  model  (2.1.1)  with  the 
Ornstein-Uhlenbeck  correlation  function 

7(u)  -  p|u|,  0  <  p  <  1  (3.1.1) 

with  equi-spaced  design  points  on  [0,1]  with  spacing  1/n.  By  defining 
0°  ■  1,  we  may  include  p  B  0  in  (3.1.1)  to  refer  to  the  uncorrelated 
errors  model.  Recall  that 

P„  -  p1/n  (3.1.2) 

is  therefore  the  adjacent  correlation,  that  is,  the  correlation  of 
neighboring  errors  (on  the  same  subject)  in  model  (2.1.1).  These 
results  will  not  require  any  distributional  assumptions  to  be  made 
for  the  error  terms  apart  from  the  fact  that  the  variance  is  finite. 

The  estimators  to  be  considered  are  again  of  linear  form 

gn(x)  -  ZnLml  wnl(K)  y^),  (3.1.3) 

which  in  matrix  form  is 

gn  -  K  Y'  (3.1.4) 

where  gn  is  the  vector  of  estimates  when  predicting  at  xnl,...,xnn 
and  Wn  is  an  (n  by  n)  matrix  whose  (i,  j)th  element  is  wni(xn]).  When 
allowed  to  vary  freely,  the  best  possible  linear  (BL)  choice  for  Wn, 
(with  respect  to  minimizing  the  mean  averaged  squared  error)  is 
denoted 


,  *\  v, 
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wJ!L  -  [(o2/m)r,  ♦  gg']'1  gg\  (3.1.5) 

where  rp  is  the  (n  by  n)  correlation  matrix  with  (i, j)th  element 
p^1"-^  and  g  is  the  n-vector  of  true  values  evaluated  at  the  design 
points. 

We  will  refer  to  the  quantity 

g*L(x)  -  ZnLml  w^(x)  y(xL)  (3.1.6) 

as  the  best  possible  linear  estimator  of  g.  The  matrix  notation  when 
estimating  at  the  design  points  is 

gBL  B  [w» iy  y.  (3.1.7) 

This  is  a  slight  abuse  of  the  term  estimator  since  g*L  depends  on  g; 
however,  such  usage  is  common  in  function  estimation.  He  examined  the 
limiting  form  of  g*L  and  the  associated  MASS  and  discovered  that, 
unless  m  — *  •  or  p  ■  0,  the  best  linear  estimator  fails  to  be 
consistent  in  our  model.  We  set  forth  Theorem  (2.8.1)  which  furnishes 
conditions  on  the  model  needed  in  order  to  obtain  consistency. 

In  this  chapter  we  will  conduct  a  numerical  study  of  four  kernel 
estimators,  whose  names  and  primary  references  are  given  by 

(1)  PC:  Priestley-Chao  (Priestley  and  Chao,  1972), 

(2)  GM:  Gasser -Muller  (Gasser  and  Muller,  1979), 

(3)  NW:  Nadar aya -Wat son  (Nadaraya  (1964)  and  Watson(1964)),  and 

(4)  GC:  Gasser -Muller  (cut -and -normalized)  (Gasser  and  Muller, 
1979). 

We  will  refer  to  either  (1)  or  (2)  as  a 

NOCANE:  not-cut-and-normalized  estimator, 
and  to  (3)  or  (4)  as  a 


m 
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CANE:  cut-and-normalized  estimator . 

This  simply  means  that  the  weights  in  the  estimator  sum  to  one  and 
hence  the  estimator  is  a  true  weighted  average.  The  corresponding 
weights  wnl(x)  possess  a  functional  form  depending  on  x,  xnl,(and 
possibly  xnl, . . . ,x„n) ,  a  bandwidth  (smoothing)  parameter  h,  and  a 
given  weighting  (kernel)  function  R( • ) .  In  addition  to  the  best 
linear  estimator  BL,  we  form  estimators  g^c(x)l  g™(x),  g°M(x),  and 
g„c(x)  in  (3.1.3)  by  choosing  wni(x)  of  corresponding  form: 


(1)  w*(x)  ■  ( 1/h)  (*n(1+1-»nt)  FlJs-^J/h], 
i  ■  l,...,n  (xnn+1  arbitrarily  defined); 

(2)  w™(x)  *  K[(x-xnl)/h]  /  It  KfU-x^/h], 
i  *  1, . . . ,n; 

(3)  w«(«)  -  (l/h)  fA  K[(x-u)/h]du, 

Ai  *  *ni  *  <*nl  4  *n,l*l)/2; 

(4)  w£(x)  -  W*(K)  /  It  W™(x). 


(3.1.8) 


(3.1.9) 


(3.1.10) 


(3.1.11) 


In  the  customary  treatment  of  these  estimators,  with  m  -  1  in  an 
uncorrelated  errors  model,  the  bandwidth  h  >  0  depends  on  n  and  tends 
to  zero  at  a  suitable  rate  if  n  is  allowed  to  tend  to  infinity.  In 
our  model,  however,  h  will  depend  on  m  as  well  as  n.  When  used  in  the 
context  of  model  (2.1.1),  h  will  be  understood  to  be  of  form  h„  B. 

The  kernels,  which  are  typically  probability  densities,  have  support 
[-1,1].  Infinite  support  kernels  are  not  considered  since  they  smear 
boundary  effects  over  the  entire  interval  of  estimation.  We  will 
return  to  retrictions  on  kernels  later. 


Provided  9  satisfies  certain  restrictions,  wa  racall  Cron 
Chap tar  2  that  tha  optimal  kernel  among  convolution  aatimators 

g„(x)  -  K(x,s)  y(s)  da  (3.1.12) 

in  tha  Ornstein-Uhlenbeck  modal  (a  *  -  log  p)  is 

K*(x,s)  «  g(x) [a2g(s)-g* (a) ] {(2ao2/m)+/g(u) [o2g(u)-g* (u)]du}_1.(3.1.13) 

Ha  nota  again  that  this  karnal  is  of  littla  uae  in  practica  sinca  ona 
doas  not  know  g(x)  or  a  in  (3.1.13).  Bowavar ,  wa  notica  that  K*(x,  •) 
dap and s  on  the  correlation  parameter  a  and  ia  not  naceasarily  non¬ 
negative.  This  suggests  that  possible  improvement  can  be  made  in  the 
kernel  function  by  incorporating  soma  estimate  of  tha  correlation. 

This  could  ba  a  difficult  problem,  which  will  be  a  moot  point  if  it 
turns  out  that  using  ordinary  kernels  and  ignoring  correlation 
performs  nearly  as  well  as  the  unrestricted  linear  estimator,  whose 
limiting  form  led  to  tha  optimal  kernel  (3.1.13).  Ona  would  further 
be  interested  in  knowing  tha  behavior  of  the  best  bandwidth,  as  a 
function  of  tha  amount  of  variance  and  correlation  present,  when  the 
naive  estimators  PC,  MW,  GM,  and  GC  are  used  and  tha  covariance 
structure  is  taken  into  account.  This  may  suggest  that  adjustisents  to 
the  bandwidth  may  be  of  much  greater  importance  than  modification  of 
the  functional  form  of  the  kernel. 

Toward  these  goals  we  have  undertaken  a  small-scale  numerical 
study,  the  details  of  which  will  be  presented  in  Section  3.3.  He 
first  will  review  some  analytic  results,  especially  those  of  Hart  and 
Hehrly  (1986),  who  have  studied  model  (2.1.1)  in  detail  (for  fairly 
general  classes  of  correlation  functions). 
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3.2  Recent  Developments 


Hart  and  Wehrly  (1986)  have  considered  aodel  (2.1.1)  under  general 
conditions  on  the  correlation  function  7.  It  is  implicit  that  7  is 
even  and  positive  definite  with 

7<0)*1,  | 7 ( u ) |  S  1  for  all  u  in  [-1,1].  (3.2.1) 

The  estimator  considered  is  the  Gasser-Muller  (GM)  estimator 
specified  by  (3.1.10).  For  analysis  of  real  data,  the  authors  propose 
choosing  h  to  minimize  an  estimate  of  the  mean  averaged  squared 
error,  which  may  be  expressed  as 

M(h)  -  MASE[g™,g] 

-  (l/n)E{r".1[g°M(xnl)-y(xnl)  ]2}  -  (o2/m)[l  -  (2/n)tr(W®*D  ] .  (3.2.2) 

In  the  equi spaced  case  the  n  distinct  elements  of  the  correlation 
matrix  r  are  functions  of 

kA  ■  Ix,^  -  k*0,l,...  (3.2.3) 


so  that  one  may  estimate  c2 7(kA)  by 


c(k)  ■  (1/mn)  [y1(*j)-y(*J) 3  (3.2.4) 


Hence  H(h)  is  formed  by  dropping  expectation  in  (3.2.2)  and  taking 

/■N  A 

o2  »  c(0)  and  7(kA)  *  c(k)/c(0).  When  using  (3.2.2)  and  ignoring 
correlation,  the  authors  report  that  the  data  may  be  either 
over smoothed  or  under  smoothed ,  depending  on  the  amount  of  correlation 
present  and  the  relative  sizes  of  o2  and  n.  Asymptotic  expressions 
are  given  for  the  mean  squared  error  and  are  suamarized  in  the 
following  theorems. 
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Theorem  (3.2.1)  --  Mart  and  Wehrly  (1986) 

Lit  i  b«  in  (0,1)  and  assume  nodal  (2.1.1)  holds.  Consider  the  GM 
estimator  as  specified  by  (3.1.10).  Let  us  assume 

(i)  K  has  support  [-1,1]  and  is  Lipschitz  continuous 

of  order  1,  (3.2.5) 

(ii)  | 7(u)~7(v) |  S  B  |u-v| ,  some  B  >  0,  and  (3.2.6) 

(iii)  l*ni-*n,i-il  *  0(l/n).  (3.2.7) 

Then  as  n,  m  — ►  ®,  h  — ►  0, 

(a)  Var[g°M(x)]  *  (o2/m)/_2/.2  t[(s-t)h]K(s)K(t)dsdt  +  0(l/nm).  (3.2.8) 
Suppose,  in  addition,  we  assume 

(iv)  g  is  Lipschitz  continuous  of  order  1  on  [0,1].  (3.2.9) 

Than  as  n,  m  — ♦  •  and  h  — »  0, 

(b)  MSE[g“(x)]  -  (o2/m)  7[<*-t)h]  K(s)K(t)dsdt 

♦  [/_i  g(x-hu)  K(u)du  -  g(x)]2  +  o(l/n).  (3.2.10) 

Theorem  (3.2.2)  —  Hart  and  Wehrly  (1986) 

Suppose  (i)  -  (iii)  of  Theorem  (3.2.1)  hold.  Let  m/n  ■  0(1)  as 
n,m  — »  »  and  assume  that  g  is  twice  continuously  differentiable  on 
[0,1]  with  g*(x)  i  0.  If  7  has  left  and  right  hand  derivatives  at  0 
with  7' (0-)  *  7 ’ ( 0+ ) ,  then  as  n,m  — »  •  and  h  — »  0, 

MSE[g°"(x)]  ~  (o2/m) [1  +  7’(0+)CKh]  ♦  h4a2[g' (x)/2]2,  (3.2.11) 

where 

CK  -  2  Z.J/J  (s-t)  K(s)K(t)dsdt  and  (3.2.12) 

°r  *  J-i  u*  K(u)du.  (3.2.13) 
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Furthermore,  the  bandwidth 

h,  -  [oV<0-)C,/ei(«'<*))2]1/3  m‘1/3  (3.2.14) 

is  optimum  in  the  sense  that,  for  all  n  and  m  sufficiently  large, 

MSE[g°M(x;h,)]  <  MSt[g*,(«;hBr.)],  (3.2.15) 

where  hn  is  any  sequence  of  bandwidths  tending  to  zero  as  n,m  — *  » 
and  such  that 

lim  inf  I  h_/h_  _  -  1  I  >  0.  (3.2.16) 

n-*» ,  ' 

Theorem  (3.2.3)  —  Hart  and  Wehrly  (1986) 

Let  all  the  conditions  of  Theorem  (3.2.2)  hold  with  the  exception 
that  7  is  assumed  to  be  twice  continuously  differentiable  on  [-1,1] 
with  7* (0)  *  0.  Then  as  n,m  — *  •  and  h  — ►  0, 

MSE[g°M(x)]  ~  (o2/m) [1  +  7*(0)o2h2]  +  h4a*[g" (x)/2]2.  (3.2.17) 
Furthermore,  if  in  addition  m/n  *  o(l),  then  the  bandwidth 

h^  -  [-2oV(0)/o2g'(x)2]1/2  m-1/2  (3.2.18) 

is  optimum  in  the  sense  of  Theorem  (3.2.2). 

The  following  observations  are  made: 

(1)  For  m  fixed  in  the  GM  estimator, 

MSE[g°M(x)]  -♦  o2/ m  as  h  -»  0,  nh  -»  »,  (3.2.19) 


which  precludes  consistency  in  this  situation. 
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(2)  If  m  — *  •  in  Theorem  (3.2.1),  we  have  mean  square 
consistency  without  requiring  nh  — »  0.  However, 

Theorems  (3.2.2)  and  (3.2.3)  show  that  when  m  is  not  large 
relative  to  n,  nh  — ►  »  is  required  in  order  to  minimize 
asymptotically  the  second  order  efficiency  MSE[g°M(x)]  - 
(o2/m) . 

(3)  Comparing  h,  and  h^  in  (3.2.14)  and  (3.2.18),  it  is  seen 
that  the  less  smooth  the  correlation  function  y  is  at  0,  the 
larger  the  bandwidth.  This  makes  sense  because  in  this 
situation,  observations  further  away  from  the  point  of 
estimation  x  will  not  be  highly  correlated  with  those  errors 
near  x.  Therefore  a  larger  bandwidth  does  not  simply  obtain 
redundant  information  about  g(x). 


The  authors  go  on  to  demonstrate  analytically  the  conditions 
under  which  there  would  be  a  larger  or  smaller  bandwidth  than 
expected  with  uncorrelated  errors. 

In  another  useful  paper,  Georgiev  and  Greblicki  (1986)  prove 
that  the  PC,  NW,  and  Of  estimators  satisfy  the  conditions  of 
Theorem  (2.8.1)  in  the  uncorrelated  errors  model  with  m  ■  1.  In  this 
setting  the  following  results  for  the  PC,  NW,  and  GM  estimators  are 
proven. 
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Theorem  (3.2.4)  --  Georgiev  and  Greblicki  (1986) 

Let  K  be  a  probability  density  function  such  that 

(i)  K  is  almost  everywhere  continuous  on  R,  (3.2.20) 

(ii)  |k(x)|  £  H(x) ,  all  x  where  H  is  symmetric, 

nonincreasing  on  [0,»),  and  /  H(y)dy  <  ».  (3.2.21) 

Let  us  further  assume  that 

(iii)  supx  H(x)  <  -,  (3.2.22) 

(iv)  maxjxn^-x^i.j  *  0(l/n),  (3.2.23) 

(v)  x  is  in  (0,1)  and  is  a  continuity  point  of  g,  and  (3.2.24) 

(vi)  h  — ♦  0,  nh  — »  •  as  n  — »  ».  (3.2.25) 

Then,  as  n  — ►  », 

E[g*c(x)  -  g(x)]2  -»  0,  and  (3.2.26) 

E[g„M(x)  -  g(x)]2  -»  0.  (3.2.27) 

If,  in  addition,  we  assume 


(vii)  c*l{|x|Saj  £  K(x)  for  all  x  and  some  c  >  0  ,  a  >  0,(3. 2. 28) 

then 

E[g™(x)  “  9(*)]2  -»  0.  (3.2.29) 


The  condition  (vi)  is  less  restrictive  than  the  conditions 
imposed  on  the  PC  estimate  by  Priestly  and  Chao  (1972)  and  Cheng  and 
Lin  (1981a),  who  require  the  stronger  assumption 

nh2  — ♦  »  as  n  — *  (3.2.30) 

Also,  unlike  results  of  Gasser  and  Muller  (1979),  the  result  for  the 


Q4  estimate  requires  neither  a  kernel  with  finite  support  nor 
continuity  of  g  over  the  entire  interval.  The  methods  for  these 


proofs  simply  show  the  general  conditions  of  Theorem  (2.8.1)  are 
satisfied  for  each  wni(x). 

Other  important  work  for  the  uncorrelated  errors  model  includes 
the  work  of  Gasser  and  Muller  (1979,1984)  and  a  series  of  papers  by 
Georgiev  (1984a, 1984b, 1984c, 1984d)  where  the  estimation  of  the 
derivatives  g(p)(x),  p  =  0,1,2,...  is  treated. 


3.3  Description  of  the  Numerical  Study 

As  mentioned  in  Section  3.1,  the  main  questions  to  be  answered  in  the 
numerical  study  may  be  summarized  below.  For  a  given  n  and  various 
correlation-variance  combinations, 

(1)  How  well  does  BL  perform?  How  well  do  PC,  NW,  GM,  GC 
perform  relative  to  BL? 

(2)  What  is  an  approximation  to  the  best  bandwidth  of  each  of 
the  four  estimators?  How  does  it  vary  as  a  function  of  the 
variance  and  correlation?  How  is  the  optimum  bandwidth 
affected  if  correlation  is  ignored?  Does  the  expected  value 
curve  defined  by  E[gn(x;h*)]  seem  to  agree  with  the  true 
g(x)  when  the  optimum  bandwidth  h*  is  used? 

(3)  Which  kernel  estimators  appear  to  do  the  best  job?  What  is 
the  effect  of  a  cut-and-normalized  estimator?  Is  there  a 
need  to  modify  the  form  of  the  estimator  to  take  correlation 


into  account? 


A  study  of  five  functions  defined  over  the  range  [0,1]  was 


undertaken: 

(A)  gA(x)  *  sin[irx  +  (x/2)]  (3.3.1) 

(B)  gB(x)  *  20  +  sin[xx  +  (x/2)]  (3.3.2) 

(C)  gc(x)  «  2  +  3x  (3.3.3) 

(D)  gD(x)  *  1  -  X  (3.3.4) 

(E)  ge(x)  «  sin(2»x)  (3.3.5) 


These  functions  are  all  special  cases  of 

g(x)  *  a  +  bx  +  sin(cx  +d) 

for  appropriate  constants  a,  b,  c  and  d.  The  FORTRAN  program  used  in 
the  numerical  study  was  designed  to  handle  any  function  of  this  type, 
the  only  restriction  being  that  d  must  be  0  if  c  *  0.  Integral 
expressions  for  certain  theoretical  quantities  in  Chapter  2  were 
worked  out  and  calculated  for  arbitrary  a,  b,  c  and  d  in  a 
subroutine.  These  quantities  afforded  a  check  on  the  reasonableness 
of  some  of  the  analytical  results  of  Chapter  2  such  as  the  limit  of 
MASE(wj|l‘)  and  the  limit  of  the  boundary  terms  of  gT_1g. 

In  selecting  a,  b,  c  and  d  for  the  study,  it  was  decided  to 
include  a  function  (such  as  A,  B)  which  possessed  a  vanishing  first 
derivative  at  the  boundaries  0  and  1.  The  corresponding  effect  on  the 
boundary  bias  was  of  interest. 

Function  B  is  identical  to  A  except  for  an  additive  constant  of 
20.  It  is  desired  to  investigate  whether  this  situation  could  lead  to 
an  optimal  bandwidth  different  from  function  A.  Intuition  suggests 
that  the  same  amount  of  smoothing  should  be  done,  although  this  turns 
out  not  to  be  the  case  for  all  estimators. 


Functions  C  and  D  are  straight  line  models  chosen  for  comparison 


purposes.  Function  E  is  a  pure  sinusoid  which  vanishes  at  the 
endpoints.  For  this  reason  the  CAJT7  would  not  necessarily  be  expected 
to  outperform  the  NOCANE. 

Functions  (D)  and  (E)  were  considered  by  Benedetti  (1974)  for 
the  PC  and  NH  estimators  in  the  uncorrelated  model  with  m  *  1.  He 
will  return  to  the  reasons  for  our  choice  of  functions  in  a  later 
discussion. 

The  study  was  carried  out  for  each  function  under  a  set  of 
conditions  corresponding  to 


(1) 

(2) 

(3) 

(4) 


The  Epanechnikov  kernel  [see  Epanechnikov  (1969)]  in  each 


estimator:  K(u)  *  0.75  (l-uz)  I(_i,i)(u); 
n  *  10,  25  for  the  number  of  design  points 
xni  ■  (i-0.5)/n,  i*l,...,n; 

Var[y(x)]  ■  c2/m  ■  10.0,  1.0,  0.1 

(This  sequence  corresponds  to  the  effect  of 

increasing  m,  the  number  of  subjects). 

P  -  0.0,  0.0001,  0.2000,  0.7000 
p10  «  0.0,  0.3981,  0.8513,  0.9650 
P25  -  0.0,  0.6918,  0.9377,  0.9858. 


(3.3.6) 

(3.3.7) 

(3.3.8) 


(3.3.9) 

(3.3.10) 

(3.3.11) 


He  have  pn  *  p1/n,  as  in  (3.1.2).  The  spacing  of  design  points  is  1/n 
where  (3.3.7)  results  in  sni  *  i/n  for  the  Gasser-Muller  integral 
bounds.  Since  m  and  a2  always  appear  together  in  the  measure  of 
error,  we  will  regard  a2/ m  as  a  single  quantity  r presenting  the 
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common  variance  of  the  sample  means  y(xnl) , . . .  ,y(xnn) . 

Recall  that  pn  represents  the  correlation  in  adjacent  vri thin- 
subject  errors.  Suppose  that  the  researcher  believes  (or  estimates) 
that  for  n=10  equally- spaced  design  points,  pn  *  0.8.  By  (3.1.2)  we 
have  the  unit  correlation  p  =  0.1074.  If  the  researcher  were  then  to 
have  a  finer  grid  of  n*20  points  available  over  the  same  x-range,  the 
adjacent  correlation  would  be  (0.1074)1/2°  ■  0.8944,  provided  the 
errors  follow  the  Ornstein-Uhlenbeck  model.  In  our  situation  it  is 
much  easier  to  interpret  pn  *  0.8944  than  p  *  0.1074  since  the  latter 
value  represents  the  correlation  at  the  extremes  of  the  interval 
[0,1].  The  use  of  p  is  convenient,  however,  as  an  input  to  a  computer 
program  designed  to  investigate  the  effect  of  finer  spacings  as  we 
increase  the  value  of  n. 

As  a  function  of  Wn,  the  mean  averaged  squared  error  may  be 
equivalently  written 

MASE[Wn]  -  (o2/ mn)tr(Wnrwn)  +  <l/n)(Wng  -  g) ' (w;g  -  g),  (3.3.12) 

where  the  terms  represent  the  portions  due  to  variance  and  squared 
bias.  Efficient  methods  may  be  found  for  evaluating  this  expression 
and  are  outlined  in  the  documentation  of  the  conputer  routines 
available  from  the  authors.  For  GM  weights.  Hart  and  Wehrly  (1986) 
have  also  given  a  reduction  of  (3.3.12).  Also  recall  that  the  BL  MASS 


simplifies  to 


MASE[w^l]  -  [g’g/n] [l  -  tr(W*L)] 


(3.3.13) 


Now  when  using  a  kernel  estimator,  Wn  is  a  matrix  function  of  the 
design  points,  the  bandwidth  h,  and  the  kernel  function  K.  The 
following  notation  will  be  used  when  referring  to  a  particular  kernel 


•-  .v . 
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estimate: 


Best  Linear  "Estimator":  BL 
Estimator:  PC,  NW,  GM,  GC 
Bandwidth:  (h;  estimator) 

Optimal  bandwidth:  (h*;  estimator)  (3.3.14) 

Weight  matrix:  W(h;estimator) 

Mean  averaged  squared  error:  M(h;estimator) . 

To  obtain  h",  M(h;estimator)  will  need  to  be  searched  over  some 
interval  (0,a].  Since  we  are  predicting  on  [0,1],  a  bandwidth  larger 
that  1  will  produce  a  very  smooth  estimator,  and  hence  a  smooth 
expected  value  of  the  estimator.  A  larger  bandwidth  than  this, 
although  possibly  decreasing  the  MASS,  will  result  in  estimates  with 
negligible  visual  differences.  For  this  reason  and  other  efficiency 
considerations,  the  bandwidth  searches  were  truncated  to  the  interval 
(0,2],  which  should  be  a  large  enough  interval  to  assess  the  behavior 
of  the  estimate.  M(h;estimator)  was  evaluated  for  an  initial  equi- 
spaced  grid.  The  interval  of  uncertainty  was  then  reduced  to  be  the 
interval  centered  at  the  minimizing  h  by  taking  the  new  endpoints  to 
be  the  neighboring  values  at  which  h  was  previously  evaluated.  On  the 
next  and  subsequent  passes  there  were  two  new  evaluations  made  at 
points  1/4  and  3/4  of  the  distance  between  the  endpoints  of  the 
interval  of  uncertainty.  These  were  compared  with  the  end  and  middle 
points  already  evaluated.  The  process  was  repeated  until  the  width  of 
the  interval  of  uncertainty  fell  below  a  user-supplied  tolerance, 
usually  0.02.  Given  the  initial  interval  (0,2]  and  the  above  recipe 


for  searching,  it  is  possible  that  the  algorithm  will  choose  exactly 
the  same  optimal  h  for  more  than  one  kernel  estimator.  If  the  minima 
were  approximately  the  same,  such  results  would  not  be  surprising 
since  there  are  a  countable  number  of  possible  endings  of  evaluation 
sequences  when  the  searches  are  conducted  in  the  same  manner. 

To  answer  questions  in  (1),  we  may  compare  the  difference  of 
M(h*;  estimator)  and  M(W*L).  The  portions  due  to  bias  and  variance 
follow  from  (3.3.12).  The  last  two  questions  w<- 11  be  answered  by 
analytic  considerations  in  the  next  sections.  Observe  that  since  the 
kernel  estimators  are  only  bandwidth-modified  for  various  p,  the 
effect  on  the  bandwidth  of  ignoring  existing  correlation  may  be 
obtained  by  comparing  the  p  ■  0  case  to  the  desired  p  >  0  case.  For 
cases  with  p  >  0,  however,  we  do  not  calculate  M(hQ ; estimator ) ,  where 
h0  is  the  best  bandwidth  when  p  *  0. 

3.4  Results  of  the  Numerical  Study 

Before  discussing  the  empirical  results,  it  will  be  useful  to 
investigate  the  behavior  of  the  estimators  for  large  and  small 
bandwidths  (with  n  fixed)  and  how  this  might  correspondingly  affect 
the  MASS  criterion.  Understanding  this  behavior  will  help  explain 
certain  aspects  of  the  numerical  study.  We  begin  by  observing  the 
following  properties  of  the  (n  by  n)  MASS  matrix  for  each  estimator, 
which  we  denote  W(h;  estimator): 


(1)  For  all  h  >  0,  W(h;PC)  and  W(h;0M)  art  synMtric  while 

W(h;NW),  W(h;GC)  nay  not  bo  symmetric.  (3.4.1) 

(2)  For  n  fixed  and  h  sufficiently  small, 

W(h; estimator)  *  chIn,  (3.4.2) 

where  ch  is  a  constant  depending  on  the  estimator. 
Furthermore,  if  we  let  h  — ►  0, 

ch  — ♦  ■  for  PC,  and  (3.4.3) 

ch  — »  1  for  HW,  GM,  GC.  (3.4.4) 

(3)  As  h  — ♦  »  (n  fixed), 

W(h; estimator)  -  dh(l/n) (lni;)  ♦  o(l),  (3.4.5) 

where  o(l)  is  a  matrix  with  each  eleoent  tending  to  0.  As 
h  — *  <■,  the  constant  dh  depends  upon  the  estimator  used: 

PC,  GM:  dh  — »  0,  and  (3.4.6) 

HW,  GC:  dh  — *  1.  (3.4.7) 

Now  if  a  MASS  matrix  of  form  (3.4.2)  is  used,  the  corresponding  MASE 
becomes 

M[chIn3  -  c£  (o2/m)  ♦  (ch-l)2(sj  +  g2),  (3.4.8) 

where  we  define 

9  *  £w  9(»„i)/n,  and  (3.4.9) 

s2  *  lniml  [g(»ni)  -  g]2/n.  (3.4.10) 

If  we  ignore  the  o(l)  term  and  use  a  MASE  matrix  of  form  (3.4.5),  the 
form  of  the  MASE  is 

M[dh(l/n)lnln]  -  d£  pn(o2/m)  +  (dh-l)2  g2  +  S2,  (3.4.11) 

where  p.  -  the  average  of  the  elements  of  the  correlation  matrix  r, 
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Observe  that  0  <  pn  <  1  and,  for  p  *  0, 


i-il 


(3.4.12) 


1/n 


0  as  n 


(3.4.13) 


(3.4.14) 


while  for  p  >  0, 

Pn  — *  J^o  exp{-a|x-y|}  dx  dy,  as  n  — >  ». 

From  equations  (3.4.8)  and  (3.4.11)  it  is  clear  that  for  h  large  or 
small,  the  MASS  for  a  given  estimator  depends  on  the  relative  sizes 


of  Var[y]  *  o2/m,  g,  and  S2. 


As  we  shall  soon  see  in  the  numerical  results,  there  were 
situations  where  the  optimum  h  was  chosen  extraordinarily  large  or 
small.  Furthermore,  there  was  often  a  large  discrepancy  in 
M(h*, estimator)  for  the  cut-and-normalized  estimators  NW,  GC  and  the 
non-cut-and-normalized  estimators  PC  and  GM.  In  some  situations, 


depending  on  the  characteristics  of  the  function  (e.g.,  g  and  S2), 


the  NOCANE's  had  a  smaller  MASS  than  the  CANE's.  This  behavior  might 
be  predictable  whenever  there  is  an  extreme  imbalance  in  the  variance 
of  the  subject  mean  and  the  "variance"  measure  of  the  function  g, 
that  is,  whenever 

Var[y]  «  S2  +  g2  or  Var[y]  »  S2  +  g2.  (3.4.15) 

These  two  competing  measures  of  variation  form  the  predominant 
components  of  the  MASS.  Whenever  one  is  much  larger  than  the  other, 
the  bandwidth  for  a  given  estimator  will  tend  to  be  selected  in  a 
manner  which  makes  ch  (or  dh)  either  very  near  0  or  very  near  1, 
depending  on  the  situation  in  (3.4.15).  Examples  of  this  behavior 
will  be  pointed  out  after  the  numerical  summary  is  presented. 


Summary  tables  of  tha  computer  study  appear  in  the  ensuing 
discussion.  Tables  (la- Id)  refer  to  function  A,  Tables  (2a- 2d)  to 
function  B,  and  so  on.  Tables  labelled  "a"  and  "c"  contain  the 
variable 

M(h*;estimator) ,  for  n«10  and  n*25, 
respectively.  The  best  possible  MASK  for  a  linear  estimator  is  also 
displayed  in  these  tables.  Tables  labelled  "b"  and  "d"  contain  the 
variable 

(h* ; estimator ) ,  for  n-10  and  n*25, 

respectively.  Since  there  is  little  discrepancy  between  the  results 
for  n*lO  and  n*25,  the  reader  may  obtain  the  salient  information  on 
the  optimal  MASK  and  bandwidth  by  considering  only  tables  labelled 
either  "a"  and  "b"  or  tables  labelled  "c"  and  Nd". 

Comparisons  of  Functions  A.  B. 

For  this  discussion  we  refer  to  Table  la,  Table  lb.  Table  lc,  and 
Table  Id  for  function  A,  and  Table  2a,  Table  2b,  Table  2c,  and 
Table  2d  for  function  B. 

For  function  A,  M(WBL)  is  smallest  for  low  variance  and  high 
correlation,  while  for  B  it  is  smallest  for  low  variance  and  low 
correlation.  This  is  probably  due  to  the  fact  that  gj  ■  0  and  g£  > 
400.5.  However,  there  is  little  effect  due  to  correlation  whan 
considering  M(WBL)  for  low  variance,  especially  for  function  A.  In 
these  situations  CANE’s,  with  a  moderate  amount  of  smoothing,  do 
significantly  better  than  the  NOCAXE's  and  perform  with  ratios  of 


about  3.0  in  relation  to  the  BL  estimator. 


Consider  functions  A  and  B  with  n  *  10.  Por  function  A  the  worst 
case  (0.39)  for  BL  occurs  for  high  variance  and  moderate  correlation, 
while  for  B,  the  worst  case  (8.307)  is  for  high  variance  and  high 
correlation.  Here  we  notice  that,  for  A,  NOCANE's  with  heavy 
smoothing  do  much  better  than  the  CAKE'S  (which  also  use  moderately 
heavy  smoothing).  Por  function  B,  the  reverse  is  true.  In  this  case, 

however,  the  NOCAKE's  prefer  very  low  smoothing.  There  is  little  to 

choose  between  a  NOCAKE  and  a  CAKE  in  the  worst  case  for  B.  He  also 
point  out  that  for  almost  all  levels  of  V(y)  with  function  B,  the 
correlation  makes  no  difference  in  the  performance  for  a  NOCAKE,  that 
is,  it  depends  almost  entirely  on  the  variance.  When  h  is  chosen 

smaller  than  1/n,  the  correlation  plays  no  part  in  the  MASS. 

Pinally  we  observe  that  the  CAKE  estimators  perform  almost  the 
same  for  functions  A  and  B.  This  is  intuitively  pleasing  since  we 
would  expect  the  amount  of  smoothing  to  be  invariant  under  additive 
shifts  in  the  function.  However,  the  smoothing  of  NOCAKE's  not  only 
depends  on  the  shift,  but  may  result  in  a  smaller  measure  of  error 
than  a  CAKE!  Prom  these  results  we  see  that  the  propensity  to  choose 
light  or  heavy  smoothing  for  the  CANE's  depends  on  which  of  o2/m  or 
pn(a2/m)  +  S2  is  the  largest.  In  particular,  if  c2/m  <  pn(o2/m)  +  S2 
(which  is  equivalent  to  o2/ m  <  S2/(l  -  pn)),  then  light  smoothing 

is  preferred.  Now  S2  ■  0.5  for  both  functions  A  and  B.  Por  n«10, 
pn  ■  0.629  whan  pn  ■  0.8513.  Hence,  whenever  p2/m  <1.35  and 
pn  *  0.629,  the  CAKE'S  prefer  light  smoothing.  This  is  borne  out  in 
Table  3.4.1,  since  in  this  case  a  very  large  bandwidth  for  the  CANE's 
is  chosen  only  when  o2/m  ■  10  >  1.35.  He  further  observe  that  the 
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TABLE  la. 

COMPARISONS 

OF 

OPTIMAL  MASS: 

FUNCTION  A,  N«10 

VARIABLE  «  M(h 

; estimator) 

VARIANCE 

ESTIMATOR 

ADJACENT 

CORRELATION 

OF  MEAN 

0 

0.3981 

0.8513 

0.9650 

10 

BL:  BEST 

0.333 

0.386 

0.390 

0.279 

PC:  NOCANE 

0.614 

0.758 

1.301 

1.640 

GM:  NOCANE 

0.614 

0.758 

1.301 

1.639 

NW:  CANE 

1.356 

2.539 

6.762 

9.275 

GC:  CANE 

1.356 

2.539 

6.762 

9.274 

1 

BL:  BEST 

0.083 

0.127 

0.131 

0.056 

PC:  NOCANE 

0.216 

0.336 

0.556 

0.600 

GM:  NOCANE 

0.217 

0.337 

0.566 

0.600 

NW:  CANE 

0.202 

0.364 

0.787 

0.959 

GC:  CANE 

0.202 

0.364 

0.787 

0.959 

0.1 

BL:  BEST 

0.010 

0.016 

0.017 

0.006 

PC:  NOCANE 

0.051 

0.072 

0.099 

0.106 

Of:  NOCANE 

0.051 

0.070 

0.093 

0.100 

NW:  CANE 

0.030 

0.049 

0.087 

0.098 

GC:  CANE 

0.030 

0.049 

0.087 

0.098 

TABLE  lb. 

COMPARISONS 

OF 

OPTIMAL  BANDWIDTH:  FUNCTION  A, 

N“10 

VARIABLE  ■ 

(h* 

; estimator) 

VARIANCE 

ESTIMATOR 

ADJACENT  CORRELATION 

OP  MEAN 

0 

0.3981 

0.8513 

0.9650 

10 

BL:  BEST 

. 

. 

. 

• 

PC:  NOCANE 

2.00 

2.00 

2.00 

2.00 

Of:  NOCANE 

2.00 

2.00 

2.00 

2.00 

NW:  CANE 

0.87 

1.11 

1.31 

0.76 

GC:  CANE 

0.87 

1.11 

1.31 

0.77 

1 

BL:  BEST 

. 

• 

• 

. 

PC:  NOCANE 

0.40 

0.50 

2.00 

2.00 

GM:  NOCANE 

0.39 

0.47 

2.00 

2.00 

NW:  CANE 

0.46 

0.52 

0.49 

0.34 

GC:  CANE 

0.46 

0.52 

0.49 

0.34 

0.1 

BL:  BEST 

• 

• 

• 

. 

PC:  NOCANE 

0.15 

0.17 

0.19 

0.20 

Of:  NOCANE 

0.14 

0.13 

0.10 

0.01 

NW:  CANE 

0.29 

0.31 

0.25 

0.19 

GC:  CANE 

0.28 

0.30 

0.25 

0.17 

TABLE  lc. 

COMPARISONS 

OF 

OPTIMAL 

MASE: 

FUNCTION 

A,  N«25 

VARIABLE  «  M(h 

estimator) 

VARIANCE 

ESTIMATOR 

ADJACENT 

CORRELATION 

OF  MEAN 

0 

0.6918 

0.9377 

0.9858 

10 

BL:  BEST 

0.222 

0.374 

0.388 

0.278 

PC:  NOCANE 

0.508 

0.740 

1.296 

1.638 

GM:  NOCANE 

0.508 

0.740 

1.296 

1.637 

NW:  CANE 

0.642 

2.397 

6.729 

9.267 

GC:  CANE 

0.642 

2.397 

6.729 

9.267 

1 

BL:  BEST 

0.037 

0.115 

0.128 

0.056 

PC:  NOCANE 

0.125 

0.323 

0.566 

0.600 

GM:  NOCANE 

0.125 

0.323 

0.566 

0.600 

NW:  CANE 

0.094 

0.342 

0.783 

0.958 

GC:  CANE 

0.094 

0.342 

0.783 

0.958 

0.1 

BL:  BEST 

0.004 

0.014 

0.017 

0.006 

PC:  NOCANE 

0.033 

0.070 

0.084 

0.084 

04:  NOCANE 

0.033 

0.070 

0.097 

0.100 

NW:  CANE 

0.014 

0.046 

0.087 

0.098 

GC:  CANE 

0.014 

0.046 

0.087 

0.098 

TABLE  Id. 

COMPARISONS 

OF 

OPTIMAL 

BANDWIDTH:  FUNCTION  A, 

N*25 

VARIABLE  - 

<h‘ 

; estimator) 

VARIANCE 

ESTIMATOR 

ADJACENT 

CORRELATION 

OF  MEAN 

0 

0.6918 

0.9377 

0.9858 

10 

BL:  BEST 

• 

• 

• 

PC:  NOCANE 

0.64 

2.00 

2.00 

2.00 

GM:  NOCANE 

0.64 

2.00 

2.00 

2.00 

NW:  CANE 

0.66 

1.09 

1.32 

0.77 

GC:  CANE 

0.65 

1.09 

1.32 

0.77 

1 

BL:  BEST 

. 

• 

• 

• 

PC:  NOCANE 

0.29 

0.48 

2.00 

2.00 

GM:  NOCANE 

0.29 

0.46 

2.00 

2.00 

NW:  CANE 

0.38 

0.51 

0.49 

0.34 

GC:  CANE 

0.38 

0.51 

0.49 

0.34 

0.1 

BL:  BEST 

• 

PC:  NOCANE 

0.13 

0.17 

0.04 

0.04 

GM:  NOCANE 

0.12 

0.17 

0.04 

0.01 

NW:  CANE 

0.24 

0.30 

0.25 

0.18 

GC:  CANE 

0.24 

0.30 

0.25 

0.18 

TABLE  2a. 

COMPARISONS 

OF 

OPTIMAL  MASS: 

FUNCTION  B,  N=10 

VARIABLE  =  M(h 

estimator) 

VARIANCE 

ESTIMATOR 

ADJACENT 

CORRELATION 

OF  MEAN 

0 

0.3981 

0.8513 

0.9650 

10 

BL:  BEST 

0.998 

2.041 

5.706 

8.307 

PC:  NOCANE 

7.331 

9.082 

9.799 

9.799 

GM:  NOCANE 

7.107 

10.000 

10.000 

10.000 

NW:  CANE 

1.356 

2.539 

6.762 

9.275 

GC:  CANE 

1.356 

2.539 

6.762 

9.274 

1 

BL:  BEST 

0.100 

0.205 

0.578 

0.847 

PC:  NOCANE 

1.052 

1.052 

1.052 

1.052 

_ 

(31:  NOCANE 

1.000 

1.000 

1.000 

1.000 

NW:  CANE 

0.202 

0.364 

0.787 

0.959 

GC:  CANE 

0.202 

0.364 

0.787 

0.959 

0.1 

BL:  BEST 

0.010 

0.021 

0.058 

0.085 

PC:  NOCANE 

0.177 

0.177 

0.177 

0.177 

GM:  NOCANE 

0.100 

0.100 

0.100 

0.100 

NW:  CANE 

0.030 

0.049 

0.087 

0.098 

GC:  CANE 

0.030 

0.049 

0.087 

0.098 

TABLE  2b. 

COMPARISONS 

OF 

OPTIMAL  BANDWIDTH:  FUNCTION  B, 

N=10 

VARIABLE  = 

(h‘ 

; estimator) 

VARIANCE 

ESTIMATOR 

ADJACENT 

CORRELATION 

OF  MEAN 

0 

0.3981 

0.8513 

0.9650 

10 

BL:  BEST 

• 

. 

• 

PC:  NOCANE 

0.12 

0.12 

0.08 

0.08 

GM:  NOCANE 

0.10 

0.01 

0.01 

0.01 

NW:  CANE 

0.87 

1.11 

1.31 

0.76 

GC:  CANE 

0.87 

1.11 

1.31 

0.77 

1 

BL:  BEST 

• 

• 

. 

* 

PC:  NOCANE 

0.08 

0.08 

0.08 

0.08 

GM:  NOCANE 

0.0^ 

0.01 

0.01 

0.01 

NW:  CANE 

0.46 

0.52 

0.49 

0.34 

GC:  CANE 

0.46 

0.52 

0.49 

0.34 

0.1 

BL:  BEST 

. 

PC:  NOCANE 

0.08 

0.08 

0.08 

0.08 

GM:  NOCANE 

0.01 

0.01 

0.01 

0.01 

NW:  CANE 

0.29 

0.31 

0.25 

0.19 

GC:  CANE 

0.28 

0.30 

0.25 

0.17 
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TABLE  2c. 

COMPARISONS 

OF 

OPTIMAL  MASE: 

FUNCTION  B,  N*25 

VARIABLE  «  M(h  ; estimator) 

VARIANCE 

ESTIMATOR 

ADJACENT 

CORRELATION 

OF  MEAN 

0 

0.6918 

0.9377 

0.9858 

10 

BL:  BEST 

0.400 

1.852 

5.549 

8.233 

PC:  NOCANE 

6.004 

10.075 

11.816 

12.179 

GM:  NOCANE 

5.625 

10.000 

10.000 

10.000 

NW:  CANE 

0.642 

2.397 

6.729 

9.267 

GC:  CANE 

0.642 

2.397 

6.729 

9.267 

1 

BL:  BEST 

0.040 

0.186 

0.562 

0.839 

PC:  NOCANE 

1.670 

2.036 

2.184 

2.214 

GM:  NOCANE 

1.000 

1.000 

1.000 

1.000 

NW:  CANE 

0.094 

0.342 

0.783 

0.958 

GC:  CANE 

0.094 

0.342 

0.783 

0.958 

0.1 

BL:  BEST 

0.004 

0.019 

0.056 

0.084 

PC:  NOCANE 

1.260 

1.296 

1.311 

1.314 

CM:  NOCANE 

0.100 

0.100 

0.100 

0.100 

NW:  CANE 

0.014 

0.046 

0.087 

0.098 

GC:  CANE 

0.014 

0.046 

0.087 

0.098 

TABLE  2d. 

COMPARISONS 

OF 

OPTIMAL  BANDWIDTH:  FUNCTION  B, 

N-25 

VARIABLE  - 

(h 

; estimator) 

VARIANCE 

ESTIMATOR 

ADJACENT 

CORRELATION 

OF  MEAN 

0 

0.6918 

0.9377 

0.9858 

10 

BL:  BEST 

• 

• 

• 

PC:  NOCANE 

0.07 

0.07 

0.07 

0.07 

GM:  NOCANE 

0.05 

0.01 

0.01 

0.01 

NW:  CANE 

0.66 

1.09 

1.32 

0.77 

GC:  CANE 

0.65 

1.09 

1.32 

0.77 

1 

BL  s  BEb  x 

. 

. 

PC:  NOCANE 

0.05 

0.05 

0.05 

0.05 

CM:  NOCANE 

0.01 

0.01 

0.01 

0.01 

NW:  CANE 

0.38 

0.51 

0.49 

0.34 

GC:  CANE 

0.38 

0.51 

0.49 

0.34 

0.1 

BL:  BEST 

• 

• 

• 

• 

PC:  NOCANE 

0.05 

0.05 

0.05 

0.05 

CM:  NOCANE 

0.01 

0.01 

0.01 

0.01 

NW:  CANE 

0.24 

0.30 

0.25 

0.18 

GC:  CANE 

0.24 

0.30 

0.25 

0.18 

jump  from  light  to  heavy  smoothing  can  be  rather  dramatic  as  one  goes 
from  a  small  o2/ m  to  a  large  a2/ m.  These  results  are  basically  the 
same  for  both  sample  sizes. 


Comparisons  of  Functions  C,  D. 

This  discussion  refers  to  Table  3a,  Table  3b,  Table  3c,  and  Table  3d 
for  function  C  and  Table  4a,  Table  4b,  Table  4c,  and  Table  4d  for 
function  D. 

For  function  C,  we  observe  that  BL  tends  to  be  very  good  when 
both  the  variance  and  the  correlation  are  low.  There  is  a  tendency 
toward  very  light  smoothing  in  NOCANE's  and  moderate  smoothing  in 
CANE's,  which  perform  slightly  better  than  NOCANE's.  The  same  is 
generally  true  for  function  0  with  the  exception  that  the  NOCANE's, 
with  moderate  smoothing,  edge  the  CANE's  (also  under  moderate 
smoothing) . 

The  worst  cases  for  BL  are  for  high  correlations  and  variances 
for  both  functions.  Here  the  NOCANE's  with  moderate  to  very  heavy 
smoothing  outperform  the  CANE's  and  do  reasonably  well  relative  to 
BL,  more  so  for  function  C  than  for  function  D.  The  CANE's  also 
prefer  moderate  to  very  heavy  smoothing.  The  performance  of  all 
estimators  is  more  dependent  on  the  variance  than  the  correlation. 


TABLE  3a. 

COMPARISONS 

OF 

OPTIMAL  MASS: 

FUNCTION 

C,  N*10 

VARIABLE  *  M(h 

estimator) 

VARIANCE 

ESTIMATOR 

ADJACENT 

CORRELATION 

OF  MEAN 

0 

0.3981 

0.8513 

0.9650 

10 

BL:  BEST 

0.928 

1.748 

3.621 

3.465 

PC:  NOCANE 

1.926 

2.786 

4.863 

5.826 

GM:  NOCANE 

1.932 

2.786 

4.864 

5.826 

NW:  CANE 

1.452 

2.688 

6.931 

9.343 

GC:  CANE 

1.452 

2.688 

6.931 

9.342 

1 

BL:  BEST 

0.099 

0.199 

0.483 

0.456 

PC:  NOCANE 

0.565 

0.771 

1.048 

0.930 

CM:  NOCANE 

0.545 

0.734 

0.943 

1.000 

NW:  CANE 

0.227 

0.395 

0.814 

0.970 

GC:  CANE 

0.227 

0.395 

0.814 

0.970 

0.1 

BL:  BEST 

0.010 

0.020 

0.050 

0.047 

PC:  NOCANE 

0.100 

0.100 

0.100 

0.100 

GM:  NOCANE 

0.100 

0.100 

0.100 

0.100 

NW:  CANE 

0.037 

0.058 

0.092 

0.099 

GC:  CANE 

0.037 

0.058 

0.092 

0.099 

TABLE  3b. 

COMPARISONS 

OF 

OPTIMAL  BANDWIDTH:  FUNCTION  C, 

N*10 

VARIABLE  - 

<h‘ 

estimator) 

VARIANCE 

ESTIMATOR 

ADJACENT 

CORRELATION 

OF  MEAN 

0 

0.3981 

0.8513 

0.9650 

10 

BL:  BEST 

• 

. 

• 

• 

PC:  NOCANE 

0.54 

0.67 

0.96 

1.11 

GM:  NOCANE 

0.53 

0.69 

0.96 

1.11 

NW:  CANE 

0.77 

0.95 

1.02 

0.66 

GC:  CANE 

0.77 

0.95 

1.03 

0.67 

1 

BL:  BEST 

• 

• 

• 

PC:  NOCANE 

0.16 

0.17 

0.18 

0.08 

GM:  NOCANE 

0.13 

0.12 

0.09 

0.01 

NW:  CANE 

0.43 

0.47 

0.44 

0.26 

GC:  CANE 

0.42 

0.47 

0.44 

0.27 

0.1 

BL:  BEST 

• 

. 

• 

PC:  NOCANE 

0.08 

0.08 

0.08 

0.08 

Q4:  NOCANE 

0.01 

0.01 

0.01 

0.01 

NW:  CANE 

0.24 

0.24 

0.15 

0.11 

GC:  CANE 

0.23 

0.24 

0.18 

0.10 

TABLE 

3c. 

COMPARISONS 

OF  OPTIMAL  MASS: 

FUNCTION  C,  N*25 

VARIABLE  -  M(h  ; estimator) 

VARIANCE 

ESTIMATOR 

ADJACENT  CORRELATION 

OF  MEAN 

0 

0.6916 

0.9377 

0.9858 

10 

BL:  BEST 

0.388 

1.598 

3.513 

3.363 

PC:  NOCANE 

1.245 

2.705 

4.858 

5.829 

GM:  NOCANE 

1.246 

2.705 

4.858 

5.829 

NW:  CANE 

0.693 

2.544 

6.903 

9.336 

GC:  CANE 

0.693 

2.544 

6.903 

9.336 

1 

BL:  BEST 

0.040 

0.180 

0.464 

0.438 

PC:  NOCANE 

0.383 

0.768 

0.983 

1.018 

GM:  NOCANE 

0.385 

0.768 

1.000 

1.000 

NW:  CANE 

0.111 

0.373 

0.810 

0.969 

GC:  CANE 

0.111 

0.373 

0.810 

0.969 

0.1 

BL:  BEST 

0.004 

0.018 

0.048 

0.045 

PC:  NOCANE 

0.089 

0.167 

0.185 

0.188 

GM:  NOCANE 

0.100 

0.100 

0.100 

0.100 

NW:  CANE 

0.018 

0.054 

0.091 

0.099 

GC:  CANE 

0.018 

0.054 

0.091 

0.099 

TABLE 

3d. 

COMPARISONS 

OF  OPTIMAL  BANDWIDTH:  FUNCTION  C, 

N-25 

VARIABLE  * 

(h  ; estimator) 

VARIANCE 

ESTIMATOR 

ADJACENT 

CORRELATION 

OF  MEAN 

0 

0.6918 

0.9377 

0.9858 

10 

BL:  BEST 

• 

• 

• 

PC:  NOCANE 

0.34 

0.67 

0.96 

1.11 

GM:  NOCANE 

0.35 

0.67 

0.96 

1.11 

NW:  CANE 

0.60 

0.93 

1.03 

0.66 

GC:  CANE 

0.60 

0.93 

1.03 

0.67 

1 

BL:  BEST 

• 

PC:  NOCANE 

0.10 

0.17 

0.08 

0.08 

94:  NOCANE 

0.10 

0.16 

0.01 

0.01 

NW:  CANE 

0.34 

0.46 

0.43 

0.27 

GC:  CANE 

0.34 

0.46 

0.44 

0.27 

0.1 

BL:  BEST 

• 

PC:  NOCANE 

0.05 

0.07 

0.07 

0.07 

94:  NOCANE 

0.01 

0.01 

0.01 

0.01 

NW:  CANE 

0.18 

0.24 

0.18 

0.09 

GC:  CANE 

0.19 

0.24 

0.18 

0.09 

TABLE 

4a. 

COMPARISONS  OF 
VARIABLE  «  M(h* 

OPTIMAL  MASS: 

; estimator) 

FUNCTION 

D,  N-10 

VARIANCE 
OF  MEAN 

ESTIMATOR 

ADJACENT  CORRELATION 

0  0.3981  0.8513  0.9650 

10 

BL: 

BEST 

0.249 

0.283 

0.303 

0.283 

PC: 

NOCANE 

0.312 

0.456 

0.999 

1.337 

GM: 

NOCANE 

0.312 

0.456 

0.999 

1.337 

NW: 

CANE 

1.077 

2.191 

6.390 

9.006 

GC: 

CANE 

1.077 

2.191 

6.390 

9.006 

1 

BL: 

BEST 

0.077 

0.122 

0.169 

0.120 

PC: 

NOCANE 

0.128 

0.186 

0.264 

0.298 

GM: 

NOCANE 

0.128 

0.186 

0.264 

0.298 

NW: 

CANE 

0.148 

0.273 

0.698 

0.936 

GC: 

CANE 

0.148 

0.273 

0.698 

0.936 

0.1 

BL: 

BEST 

0.010 

0.018 

0.031 

0.018 

PC: 

NOCANE 

0.037 

0.053 

0.085 

0.090 

GM: 

NOCANE 

0.037 

0.053 

0.086 

0.095 

NW: 

CANE 

0.023 

0.040 

0.082 

0.097 

GC: 

CANE 

0.023 

0.040 

0.082 

0.097 

TABLE 

4b. 

COMPARISONS 

OF 

OPTIMAL  BANDWIDTH:  FUNCTION  D, 

N-10 

VARIABLE  * 

(h  ; estimator) 

VARIANCE  ESTIMATOR 

ADJACENT 

CORRELATION 

OF  MEAN 

0 

0.3981 

0.8513 

0.9650 

10 

BL: 

BEST 

• 

PC: 

NOCANE 

2.00 

2.00 

2.00 

2.00 

GM: 

NOCANE 

2.00 

2.00 

2.00 

2.00 

NW: 

CANE 

1.66 

2.00 

2.00 

2.00 

GC: 

CANE 

1.66 

2.00 

2.00 

2.00 

1 

BL: 

BEST 

• 

• 

• 

PC: 

NOCANE 

0.71 

1.05 

2.00 

2.00 

GM: 

NOCANE 

0.70 

1.05 

2.00 

2.00 

NW: 

CANE 

0.75 

0.92 

0.98 

0.64 

GC: 

CANE 

0.75 

0.92 

0.98 

0.64 

0.1 

BL: 

BEST 

• 

PC: 

NOCANE 

0.25 

0.34 

0.30 

0.20 

GM: 

NOCANE 

0.25 

0.32 

0.32 

0.11 

NW: 

CANE 

0.42 

0.46 

0.43 

0.25 

GC: 

CANE 

0.41 

0.46 

0.42 

0.25 
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TABLE  4c. 

COMPARISONS 

OF 

OPTIMAL  MASS: 

FUNCTION 

D,  N-25 

VARIABLE  -  M(h 

estimator) 

VARIANCE 

ESTIMATOR 

ADJACENT  CORRELATION 

OF  MEAN 

0 

0.6918 

0.9377 

0.9858 

10 

BL:  BEST 

0.182 

0.279 

0.302 

0.280 

PC:  NOCANE 

0.234 

0.439 

0.995 

1.336 

04:  NOCANE 

0.234 

0.438 

0.995 

1.336 

NX:  CANE 

0.471 

2.052 

6.357 

8.998 

GC:  CANE 

0.471 

2.052 

6.357 

8.998 

1 

BL:  BEST 

0.036 

0.112 

0.164 

0.116 

PC:  NOCANE 

0.080 

0.181 

0.264 

0.298 

GM:  NOCANE 

0.080 

0.181 

0.264 

0.298 

NW:  CANE 

0.071 

0.259 

0.695 

0.935 

GC:  CANE 

0.071 

0.259 

0.695 

0.935 

0.1 

BL:  BEST 

0.004 

0.016 

0.029 

0.017 

PC:  NOCANE 

0.024 

0.052 

0.086 

0.096 

GM:  NOCANE 

0.024 

0.052 

0.086 

0.097 

NW:  CANE 

0.011 

0.038 

0.082 

0.097 

GC:  CANE 

0.011 

0.038 

0.082 

0.097 

TABLE  id. 

COMPARISONS 

OF 

OPTIMAL  BANDWIDTH:  FUNCTION  D, 

N-25 

VARIABLE  * 

(h 

; estimator) 

VARIANCE 

ESTIMATOR 

ADJACENT 

CORRELATION 

OF  MEAN 

0 

0.6918 

0.9377 

0.9858 

10 

BL:  BEST 

• 

• 

« 

. 

PC:  NOCANE 

1.75 

2.00 

2.00 

2.00 

GM:  NOCANE 

1.75 

2.00 

2.00 

2.00 

NW:  CANE 

1.16 

2.00 

2.00 

2.00 

GC:  CANE 

1.16 

2.00 

2.00 

2.00 

1 

BL:  BEST 

• 

• 

m 

# 

PC:  NOCANE 

0.49 

0.99 

2.00 

2.00 

OH:  NOCANE 

0.49 

0.99 

2.00 

2.00 

NW:  CANE 

0.59 

0.90 

0.98 

0.64 

GC:  CANE 

0.59 

0.90 

0.98 

0.64 

0.1 

BL:  BEST 

# 

# 

• 

PC:  NOCANE 

0.18 

0.31 

0.32 

0.20 

Oi:  NOCANE 

0.18 

0.31 

0.33 

0.20 

NW:  CANE 

0.33 

0.45 

0.42 

0.26 

GC:  CANE 

0.33 

0.45 

0.42 

0.26 
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Results  for  Function  E. 

In  this  discussion  we  will  refer  to  Table  5a  Table  5b,  Table  5c,  and 
Table  5d  for  function  E,  an  ordinary  sinusoid. 

For  function  E  it  is  very  interesting  to  note  the  similarity 
with  the  A-results  for  the  BL  MASS,  which  is  identical  when  p  *  0 
over  both  sample  sizes.  Ne  observe,  however,  that  the  NOCANE's  are  at 
least  slightly  better  than  CANE's  for  all  combinations  of  variance 
and  correlation.  The  performance  of  the  NOCANE's  and  CANE ' s  is 
expected  to  be  comparable  when  we  are  considering  a  function  for 
which  g(0)  *  g(l)  *  0. 


j 

i 

A 

" 

JS 
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3.5  Swary 

In  summary  we  make  the  following  observations: 

(1)  Whether  to  cut-and-normalize  depends  not  only  on  boundary 
conditions,  but  also  considerations  about  the  relative  sizes 
of  V(y) ,  p,  g2  and  S2. 

(2)  For  a  given  function,  there  are  only  minor  differences 
between  the  two  cut-and-normalized  estimators  and  between 
the  two  NOCANE's.  However,  the  difference  between  the  two  is 
not  negligible  asymptotically  (see  Gasser  and 

Muller  (1979)). 
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TABLE  5a. 

COMPARISONS 

OF 

OPTIMAL  MASS: 

FUNCTION  E,  N-10 

VARIABLE  -  M(h 

;astimator) 

VARIANCE 

ESTIMATOR 

ADJACENT 

CORRELATION 

OF  MEAN 

0 

0.3981 

0.8513 

0.9650 

10 

BL:  BEST 

0.333 

0.387 

0.329 

0.158 

PC:  NOCANE 

0.620 

0.764 

1.307 

1.645 

GM:  NOCANE 

0.620 

0.764 

1.306 

1.644 

NW:  CANE 

1.439 

2.583 

6.787 

9.375 

GC:  CANE 

1.439 

2.583 

6.787 

9.375 

1 

BL:  BEST 

0.083 

0.127 

0.080 

0.022 

PC:  NOCANE 

0.223 

0.358 

0.572 

0.606 

Of:  NOCANE 

0.222 

0.362 

0.572 

0.606 

NW :  CANE 

0.290 

0.471 

0.877 

0.985 

GC:  CANE 

0.290 

0.470 

0.877 

0.985 

0.1 

BL:  BEST 

0.010 

0.016 

0.009 

0.002 

PC:  NOCANE 

0.039 

0.059 

0.084 

0.091 

GM:  NOCANE 

0.039 

0.059 

0.086 

0.094 

NW:  CANE 

0.051 

0.072 

0.096 

0.100 

GC:  CANE 

0.051 

0.072 

0.096 

0.100 

TABLE  5b. 

COMPARISONS 

OF 

OPTIMAL  BANDWIDTH:  FUNCTION  E, 

N-10 

VARIABLE  - 

(h* 

;a»ti*ator) 

VARIANCE 

ESTIMATOR 

ADJACENT 

CORRELATION 

OF  MEAN 

0 

0.3981 

0.8513 

0.9650 

10 

BL:  BEST 

• 

• 

PC:  NOCANE 

2.00 

2.00 

2.00 

2.00 

GM:  NOCANE 

2.00 

2.00 

2.00 

2.00 

NW:  CANE 

1.04 

1.42 

2.00 

1.02 

GC:  CANE 

1.04 

1.42 

2.00 

1.02 

1 

BL:  BEST 

. 

• 

• 

• 

PC:  NOCANE 

0.32 

0.40 

2.00 

2.00 

GM:  NOCANE 

0.33 

0.39 

2.00 

2.00 

NW:  CANE 

0.34 

0.43 

0.33 

0.13 

GC:  CANE 

0.34 

0.42 

0.32 

0.12 

0.1 

BL:  BEST 

• 

• 

• 

• 

PC:  NOCANE 

0.17 

0.18 

0.20 

0.20 

GM:  NOCANE 

0.18 

0.18 

0.16 

0.13 

NW:  CANE 

0.15 

0.13 

0.11 

0.01 

GC:  CANE 

0.15 

0.14 

0.09 

0.01 

TABLE 

5c. 

COMPARISONS 

OF 

OPTIMAL  MASE: 

FUNCTION 

E,  N*25 

VARIABLE  >  M(h 

; estimator) 

VARIANCE 

ESTIMATOR 

ADJACENT  CORRELATION 

OF  MEAN 

0 

0.6918 

0.9377 

0.9858 

10 

BL:  BEST 

0.222 

0.377 

0.312 

0.141 

PC:  NOCANE 

0.542 

0.746 

1.302 

1.643 

GM:  NOCANE 

0.542 

0.746 

1.302 

1.643 

NW:  CANE 

0.783 

2.446 

6.755 

9.372 

GC:  CANE 

0.783 

2.446 

6.755 

9.372 

1 

BL:  BEST 

0.037 

0.118 

0.071 

0.019 

PC:  NOCANE 

0.114 

0.345 

0.572 

0.606 

04:  NOCANE 

0.114 

0.346 

0.572 

0.606 

NW:  CANE 

0.151 

0.451 

0.875 

0.985 

GC:  CANE 

0.151 

0.451 

0.875 

0.985 

0.1 

BL:  BEST 

0.004 

0.015 

0.008 

0.002 

PC:  NOCANE 

0.019 

0.055 

0.084 

0.092 

GM:  NOCANE 

0.020 

0.055 

0.085 

0.093 

NW:  CANE 

0.027 

0.068 

0.096 

0.100 

GC:  CANE 

0.026 

0.068 

0.096 

0.100 

TABLE 

5d. 

COMPARISONS 

OF  OPTIMAL  BANDWIDTH:  FUNCTION  t, 

N«25 

VARIABLE  - 

(h 

estimator) 

VARIANCE  ESTIMATOR  ADJACENT  CORRELATION 

OF  MEAN  0  0.6918  0.9377  0.9858 


BL: 

BEST 

• 

• 

, 

• 

PC: 

NOCANE 

2.00 

2.00 

2.00 

2.00 

GM: 

NOCANE 

2.00 

2.00 

2.00 

2.00 

NW: 

CANE 

0.66 

1.44 

2.00 

1.05 

GC: 

CANE 

0.65 

1.44 

2.00 

1.05 

BL: 

BEST 

• 

• 

• 

PC: 

NOCANE 

0.26 

0.36 

2.00 

2.00 

GM: 

NOCANE 

0.26 

0.38 

2.00 

2.00 

NW: 

CANE 

0.25 

0.39 

0.30 

0.13 

GC: 

CANE 

0.25 

0.39 

0.30 

0.13 

BL: 

BEST 

• 

• 

PC: 

NOCANE 

0.14 

0.18 

0.16 

0.16 

GM: 

NOCANE 

0.15 

0.18 

0.15 

0.13 

NW: 

CANE 

0.13 

0.14 

0.09 

0.01 

GC: 

CANE 

0.13 

0.14 

0.09 

0.01 

0.1 
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(3)  The  amount  of  smoothing  for  a  givan  astimator  ganarally 
incraasas  with  incraasing  corralation  and  variance.  For  a 
fixed  variance,  tha  requisite  amount  of  smoothing  tends  to 
increase  with  corralation  and  decrease  for  larger  p.  This  is 
consistent  with  the  findings  of  Hart  and  Wahrly  (1986). 
However,  for  a  givan  p  and  V(y),  a  CANE  may  choose 
significantly  more  or  less  smoothing  than  a  NOCANE. 

(4)  As  evidenced  by  functions  A  and  B,  the  efficiency  of  a 
kernel  estimator  (whan  using  MASE  criterion)  depends  heavily 
on  how  pnV(y)  compares  with  g2  and  S2. 

(5)  If  one  ignores  correlation  whan  it  actually  is  present, 
then,  for  a  givan  estimator,  there  would  probably  be 
under smoothing  if  the  variance  is  high.  If  the  variance  is 
low  then  there  can  either  be  under smoothing  or 

over  smoothing . 

(6)  There  seems  to  be  a  need  to  modify  the  estimator  to  be  a 
function  of  the  estimated  correlation  in  certain  situations 
where  the  BL  estimate  performs  significantly  better  (e.g. 
function  A  with  high  correlation). 

It  appears  that  the  presence  of  large  correlation,  especially 
with  a  large  variance,  tends  to  produce  a  very  large  bandwidth  when 
using  the  MASE  criterion.  Large  bandwidths  tend  to  produce  nearly 
flat,  featureless  estimates.  Of  course  the  variance  in  the  estimator 
is  decreased,  but  the  bias  may  be  rather  large,  especially  near  the 
boundaries.  A  very  large  positive  correlation  translates  into  sample 
paths  very  nearly  parallel  to  the  population  regression  function.  If 


the  variance  it  small,  tha  tandancy  would  ba  toward  a  nail 
bandwidth;  a  large  variance  would  tend  to  call  for  over -smoothing. 
This  is  often  the  best  a  kernel  estimate  can  do  to  correct  for  the 
fact  that  the  estimate  will  tend  to  stay  on  one  side  of  the 
regression  function.  For  tha  most  part,  a  flat  (over -smoothed) 
estimate  will  stay  on  the  same  side  of  any  continuous  regression 
function  g,  but  may  at  least  cross  g.  Hence  it  seams  natural  for  a 
kernel  estimate  to  select,  at  the  expense  of  the  bias,  a  large 
bandwidth  in  these  situations.  It  is  conjectured  that  a  variable 
bandwidth  method  (e.g.  nearest  neighbor  estimation)  may  prove  to  be  a 
more  satisfactory  estimate. 
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